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Experimental set up by Prof. Libero Zuppiroli and Philippe Bugnon, 
Laboratory of Optoelectronics Molecular Materials, EPFL, Lausanne, 
Switzerland. 



The photograph captures an experiment first described by Isaac Newton 
in Opticks in 1730, explaining how the white light can be split into its 
color components and then resynthesized. It is a physical implementa- 
tion of a white light decomposition into its Fourier components — the col- 
ors of the rainbow, followed by a synthesis to recover the original. This 
experiment graphically summarizes the major theme of the book — many 
signals can be split into essential components, and the signal's charac- 
teristics can be better understood by looking at its components; the 
process called analysis. These components can be used for processing 
the signal; more importantly, the signal can be perfectly recovered from 
those same components, through the process called synthesis. 
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AR Autoregressive 

ARMA Autoregressive moving average 

AWGN Additive white Gaussian noise 

BIBO Bounded input, bounded output 

CDF Cumulative distribution function 

DCT Discrete cosine transform 

DFT Discrete Fourier transform 
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5 n = 1 for n — 0; Sn — otherwise 
Xi{t) — 1 f° r t & I', Xi (t) — otherwise 



u dv 



v du 



complex number 


z 


a + jb, re 3 


conjugation 


z* 


a — jb, re~ 


conjugation of coefficients 


X.(z) 


X'(z') 


but not of z itself 






principal root of unity 


W N 


-j2it/N 



a,b £ 



G[0, 
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Standard Vector Spaces 

Hilbert space of square-summable 
sequences 

Hilbert space of square-integrable 
functions 



f{Z) 



r 2 (R) 



P{Z) 



normed vector space of sequences with 
finite p norm, f < p < oo 

normed vector space of functions with C (R) 

finite p norm, f < p < oo 

normed vector space of bounded sequences with £°° (Z) 
supremum norm 

normed vector space of bounded functions with £°° (R) 
supremum norm 



Bases and Frames for Sequences 

standard Euclidean basis 
vector, element of basis or frame 
basis or frame 
operator 

vector, element of dual basis or frame 

operator 

expansion in a basis or frame 



{en} 

$ 
$ 

$ 

X = $<f>*X 



< x : Z — » C | y \x n \ < oo > with 
inner product {x, y) — > XnVn 



i:l^C / \x(t)\ dt < oo > with 
inner product (x, y) — I x(t)y(t)* dt 

x : Z -> C | ^2 \%n\ P < oo I with 
arm ||x|| p = (^ \x n \ p ) 1/p 

n 

i:l^C / |x(i)| p dt < oo I with 
norm ||x|| p = ( f \x(t)\ p dt) 1/p 
x : Z — » C | sup \x n I < oo > with 

n J 

norm ||x||oo = sup \x n \ 

n 

: R -> C | sup |x(i)| < oo I with 

norm ||a;||oo = sup \x(t)\ 

t 

e n ,k — f , for k — n, and otherwise 
when applicable, a column vector 
set of vectors {<p n } 
concatenation of ip n s in a linear 
operator: [tpo tp\ ... vpjv_i] 
when applicable, a column vector 
set of vectors {</5n} 
concatenation of tp n s in a linear 
operator: [ipo <pi ... ifN-i] 
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Transforms 

Fourier transform 



Fourier series 



x(t)<^X{u>) 



x(t) A X k 



discrete-time Fourier transform x n < > X(e? 



discrete Fourier transform x n < ► Xu 



local Fourier transform 



wavelet series 



discrete wavelet transform 



discrete cosine transform 



z-transform 



x(t)^>x(n, T ) 



continuous wavelet transform x(t) < > X(a,b) 



WS, M) 



x(t) ^ # 



X n < > OL k ,p k , . 



Quick Reference 



OCT 

Xn < > Xk 



x n £^X(z) 



X{u) = / x{t)e~ ] " % dt 

•J — oo 

x(t) = — f X(lo) e 3UJt dui 

27T ./_„ 
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T/2 
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-j(2w/T)kt 
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feez 
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2tt _ 

jV-1 



X, = £) as„W^" 



n = 
W-l 



= i^E x » w - kn 



X(Q,t)= x(t)p(i-r)e" JS " dt 

J — OO 
-I /*OC /*00 

x ( t ) = 7T / ^(fi,r)gn, T (i)dfidr 

2?r J_ 00 J_ 00 

roo 

X(a,b) = / x(t)ip a]b (t)dt 

Cv, Jo i-oo a 



^ 



W 



x(t)ip e , k (t)dt 



«ez feez 

nez nez 

fcez 



to 
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«=i feez 



n = 
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Discrete-Time Nomenclature 

Sequence x n 

Convolution 

linear h * x 



circular 



h © x 

(h * X) n 

ilf — n *n X 



n ^"n •^n — m 



Eigensequence v 

infinite time v 

finite time v n — < 

Frequency response 

infinite time H(e 



e J 

j2Trkn/N 



finite time 



Hi: 



Continuous-Time Nomenclature 

Function x(t) 

Convolution 



linear 
circular 

Eigenfunction 
infinite time 
finite time 

Frequency response 
infinite time 

finite time 



h * x 

h © x 

(h* x)(t) 

v{t) 

v(t) = e JUJt 

v{t) = e 32M/T 

H{w) 

Hi 



signal, vector 
2_^Xkh n -k — / ,hkX„- 



JV-l JV-l 

/ x k h(rt-k) mod AT = / hk ^(n-fe) mod N 

k=a k=o 

nth element of the convolution result 



/ y Xk— m "£— n+fc 



eigenfunction, eigenvector 

h * v — H(e JI *') v 

h © v = H k v 

eigenvalue corresponding to v n 



-jun 

/^ rl ^l. 

neZ 

JV-l JV-l 

-j2-Kkn/N 



J2 h " w " 



ii=0 



nez 

JV-l 

^ hnt 

n = 

signal 



OO /"00 

x(r)h(t - t) dr = h(r)x(t - r) dr 

-oo -/-oo 

T <-T 

x(r)h(t - t) dr - h(T)x(t - t) dr 

o Jo 

convolution result at £ 

eigenvector 

h * v = H(ui)v 

h® v = HkV 

eigenvalue corresponding to v(t) 

/oo 
h(t)e~ jwt di 
-OO 

■T/2 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovacevic, and V. K. Goyal 



Quick Reference 



Two-Channel Filter Banks 

Basic characteristics 



number of channels 


M = 2 








sampling factor 


N = 2 








channel sequences 


Ctn 


Pn 






Filters 


Synthesis 




Analysis 






lowpass 


highpass 


lowpass 


highpass 


orthogonal 


9n 


h n 


g-n 


h- n 


biorthogonal 


9n 


h n 


g-n. 


h n 


polyphase components 


go,n,gi, n 


ha,n, hl y n 


go,n,gi,n 


ho,n, hl y n 


Tree-Structured Filter Banks (DWT) 






Basic characteristics 










number of channels 


M = J +1 








sampling at level t 


N W = 2 l 








channel sequences 


a (J) 


p n e) 






Filters 


Synthesis 




Analysis 






lowpass 


bandpass 


lowpass 


bandpass* ' 


orthogonal 


9 i J) 


h W 

ll, n 


9-n 


h (e) 


biorthogonal 


gi J) 


W 


~(J) 

g n 


h W 


polyphase component j 


An 


c 


~(j) 

9),n 


h w 


iV-Channel Filter Banks 










Basic characteristics 










number of channels 


M = N 








sampling factor 


N 








channel sequences 


C%i,iL 








Filters 


Synthesis 




Analysis 




orthogonal filter i 


gi,n 




9i, — n 




biorthogonal filter i 


gi,n 




9i,n 




polyphase component j 


9i,j,n 




9i,j,™ 




Oversampled Filter Banks 








Basic characteristics 










number of channels 


M > N 








sampling factor 


N 








channel sequences 


Oti,n 








Filters 


Synthesis 




Analysis 




filter i 


Qi,n 




9i,n 




polyphase component j 


Qi,j,n 




9i,i,n 





= 1,2, 



j = 0, 1, • . 



i = 0,l, 



,N-1 



j = 0,l,...,N-l 



i = 0, 1, 



0,1, 



,M- 



,N- 1 
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Preface 



The aim of this book is to provide a set of tools for users of state-of-the-art signal 
processing technology and a solid foundation for those hoping to advance the the- 
ory and practice of signal processing. Many of the results and techniques presented 
here, while rooted in classic Fourier techniques for signal representation, first ap- 
peared during a flurry of activity in the 1980s and 1990s. New constructions for 
local Fourier transforms and orthonormal wavelet bases during that period were mo- 
tivated both by theoretical interest and by applications, in particular in multimedia 
communications. New bases with specified time-frequency behavior were found, 
with impact well beyond the original fields of application. Areas as diverse as com- 
puter graphics and numerical analysis embraced some of the new constructions — no 
surprise given the pervasive role of Fourier analysis in science and engineering. 

Now that the dust has settled, some of what was new and esoteric is now 
fundamental. Our motivation is to bring these new fundamentals to a broader 
audience to further expand their impact. We thus provide an integrated view of 
classical Fourier analysis of signals and systems alongside structured representations 
with time-frequency locality and their myriad of applications. 

Structure of the Book The book is divided into two parts, the first on foundations 
and the second on structured representations for signal processing, connected via a 
bridge — Intermezzo. We have decided to publish the two parts separately, so that 
the book can be used as soon as possible. While the first part/book can be used 
independently to cover the foundations of signals and systems, the second part/book 
relies heavily on the base built in the first part. Thus, these two books are to be 
seen as integrally related to each other. 

Part I: Foundations of Signals and Systems This part reviews the necessary 
mathematical material to make the book self-contained. For many readers, this 
material might be well known; for others, not, and thus welcome. It is a refresher 
of the basic mathematical concepts used in signal processing and communications. 
Thus, in Chapter [TJ From Euclid to Hilbert, the basic geometric intuition central 
to Hilbert spaces is reviewed, together with all the necessary tools underlying the 
construction of bases. Chapter \2\ Sequences and Discrete-Time Systems, is a crash 
course on processing signals in discrete time or discrete space. In Chapter [3j, Func- 
tions and Continuous- Time Systems, the mathematics of Fourier transforms and 
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Fourier series is reviewed. Chapter [4] Sampling and Interpolation, talks about the 
critical link between discrete and continuous domains as given by the sampling the- 
orem and interpolation, while Chapter \E\ Approximation and Compression, veers 
from the exact world to the approximate one. The final chapter in Part I, Chapter \6\ 
Time- Frequency Localization, considers time-frequency behavior of the abstract rep- 
resentation objects studied thus far 

Intermezzo: Bridging Parts I and II This short interlude aims to recap the tools 
seen up to that point, discuss issues arising in the real world as well as ways of 
adapting these tools for use in the real world. The main concepts seen in Part I, ge- 
ometry of Hilbert spaces, existence of bases, Fourier representations, sampling and 
interpolation as well as approximation and compression, build a powerful founda- 
tion for modern signal processing. These tools hit roadblocks they must overcome: 
finiteness and localization, limitations of uncertainty, computational costs. 

Part II: Structured Representations for Signal Processing This part presents 
signal representations, including Fourier, local Fourier and wavelet bases, related 
constructions, as well as frames and continuous transforms. 

It starts with Chapter [7] Filter Banks: Building Blocks of Time- Frequency Ex- 
pansions, which presents a thorough treatment of the basic block — the two-channel 
filter bank, a signal processing device that splits a signal into a coarse, lowpass 
approximation, and a highpass detail. 

We generalize this block in the three chapters that follow, all dealing with 
Fourier- and wavelet-like representations on sequences: In Chapter \8\ Local Fourier 
Bases on Sequences, we discuss Fourier-like bases on sequences, implemented by 
Af-channel modulated filter banks (first generalization of the two-channel filter 
banks) . In Chapter \9\ Wavelet Bases on Sequences, we discuss wavelet-like bases 
on sequences, implemented by tree-structured filter banks (second generalization). 
In Chapter \W\ Local Fourier and Wavelet Frames on Sequences, we discuss both 
Fourier- and wavelet-like frames on sequences, implemented by oversampled filter 
banks (third generalization). 

We then move to the two chapters dealing with Fourier- and wavelet-like 
representations on functions. In Chapter [IT], Local Fourier Transforms, Frames 
and Bases on Functions, we start with the most natural representation of smooth 
functions with some locality, the local Fourier transform, followed by its sampled 
version/frame, and leading to results on whether bases are possible. In Chapter [12] 
Wavelet Bases, Frames and Transforms on Functions, we do the same for wavelet 
representations on functions, but in opposite order: starting from bases, through 
frames and finally continuous wavelet transform. 

The last chapter, Chapter [13l Approximation, Estimation, and Compression, 
uses all the tools we introduced to address state-of-the-art signal processing and 
communication problems and their solutions. The guiding principle is that there is 
a domain where the problem at hand will have a sparse solution, at least approxi- 
mately so. This is known as sparse signal processing, and many examples, from the 
classical Karhunen-Loeve expansion to nonlinear approximation in discrete cosine 
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transform and wavelet domains, all the way to contemporary research in compressed 
sensing, use this principle. The chapter introduces and overviews sparse signal pro- 
cessing, covering approximation methods, estimation procedures such as denoising, 
as well as compression methods and inverse problems. 

Teaching Points Our aim is to present a synthetic view from basic mathematical 
principles to actual constructions of bases and frames, always with an eye on con- 
crete applications. While the benefit is a self-contained presentation, the cost is a 
rather sizable manuscript. To aid with teaching, we provide a reading guide with 
numerous routes through the material; the levels span from elementary to advanced, 
but in a gradual fashion and with indications of levels of difficulty. Referencing in 
the main text is sparse; pointers to bibliography are given in Further Reading at 
the end of each chapter. 

The material grew out of teaching signal processing, wavelets and applications 
in various settings. Two of the authors, Martin Vetterli and Jelena Kovacevic, 
authored a graduate textbook, Wavelets and Subband Coding, Prentice Hall, 1995, 
which they and others used to teach graduate courses at various US and European 
institutions. This book is online with open accesso With more than a decade of 
experience, the maturing of the field, and the broader interest arising from and for 
these topics, the time was right for an entirely new text geared towards a broader 
audience, one that could be used to span levels from undergraduate to graduate, as 
well as various areas of engineering and science. As a case in point, parts of the text 
have been used at Carnegie Mellon University in classes on bioimage informatics, 
where some of the students are life-sciences majors. This plasticity of the text is 
one of the features which we aimed for, and that most probably differentiates the 
present book from many others. Another aim is to present side-by-side all methods 
that arose around signal representations, without favoring any in particular. The 
truth is that each representation is a tool in the toolbox of the practitioner, and the 
problem or application at hand ultimately determines the appropriate one to use. 



Martin Vetterli, Jelena Kovacevic and Vivek K Goyal 

October 2011 



http: //wavelet sandsubbandcoding. org/ 
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Release Notes 



This is the alpha 3.0 release of the book. Below, we summarize the level of com- 
pleteness by chapter. 
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Part I 
Intermezzo 
Part II 

Front material 
Back material 



Front matter consists of the following items: 

(i) Image Attribution 

The section gives credit for all the images not produced by the authors. It is 

complete, listing all the images in the current version, 
(ii) Abbreviations 
(iii) Quick Reference 

The section summarizes various concepts used in the book. 
(iv) Preface 
(v) Reading Guide 

The guide lists several roadmaps of how the book could be used, 
(vi) From Rainbows to Spectra 

This chapter is complete. 

Part I consists of the following chapters: 

(i) Chapter 1: From Euclid to Hilbert 

This chapter is complete. 
(ii) Chapter 2: Sequences and Discrete-Time Systems 

This chapter is complete. 
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(iii) Chapter 3: Functions and Continuous- Time Systems 

This chapter is complete. 
(iv) Chapter 4: Sampling and Interpolation 

This chapter is about 90% complete. 
(v) Chapter 5: Approximation and Compression 

This chapter is about 80% complete. 
(vi) Chapter 6: Time- Frequency Localization 

This chapter is about 60% complete. 

Intermezzo is a single chapter bridging Parts I and II. It is about 60% complete. 

Part II consists of the following chapters: 

(i) Chapter 7: Filter Banks: Building Blocks of Time- Frequency Expansions 
This chapter is about 95% complete. Left to finish is Section 17.6 on two- 
channel filter banks with stochastic inputs. 

(ii) Chapter 8: Local Fourier Bases on Sequences 
This chapter is complete. 

(iii) Chapter 9: Wavelet Bases on Sequences 
This chapter is complete. 

(iv) Chapter 10: Local Fourier and Wavelet Frames on Sequences 

This chapter is about 80% complete. Left to finish are Section 10.61 on com- 
putational aspects and a couple of examples. 
(v) Chapter 11: Local Fourier Transform, Frames and Bases on Functions 

This chapter is about 20% complete. Section 11.2 on local Fourier transform 
is the only one written. 

(vi) Chapter 12: Wavelet Bases, Frames and Transforms on Functions 

This chapter is about 20% complete. Sections 12.1-HT2721 are essentially done; 
Sections 1 12. 31 and 1 12. 51 are written but need major revisions and have notation 
that does not agree with the rest of the book. Section 12.41 is yet to be 
written, as are all the components at the end of the chapter (Chapter at a 
Glance, Historical Remarks and Further Reading). 
(vii) Chapter 13: Approximation, Estimation, and Compression 

An outline of the chapter is included; the chapter is yet to be written. 

Back matter consists of the following items: 

(i) Bibliography 

The bibliography is complete in the current version. 
(ii) Index 

The index will be generated at the very end. 
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From Rainbows to Spectra 



In the 13th century, the Dominican monk, theolo- 
gian and physicist, Dietrich von Freiberg, performed 
a simple experiment: he held a spherical bottle filled 
with water in the sunlight. The bottle played the 
role of a single water drop, and, following the tra- 
jectory of the light that it diffracted and reflected, 
gave a scientific explanation of the rainbow effect, 
including the secondary rainbow with weaker re- 
versed colors. He wrote his conclusions in one of his 
famous treatises, De iride (on the rainbow), "prob- 
ably the most dramatic development of 14th- and 
15th-century optics" [63] . 

Von Freiberg fell short of complete understanding of the rainbow phenomenon 
because, like many of his contemporaries, he believed that colors were simply inten- 
sities between black and white. A full understanding emerged three hundred years 
later when Descartes and Newton explained that dispersion separates white light 
into spectral components of different wavelengths — the colors of the rainbow. 

This brings us to a central theme of this book: decomposing an entity into its 
constituent components can be a key step in understanding its essential character. 
This decomposition can enable even more, such as modifications in the decomposed 
state. The rainbow's appearance is explained by the fact that sunlight contains a 
combination of all wavelengths within the visible range; separation of white light 
by wavelength, as with a prism, enables modifications prior to recombination. The 
collection of wavelengths is, as we will see, the spectrum. 

A French physicist and mathematician, Joseph Fourier, formalized the notion 
of the spectrum in the early 19th century. He was interested in the heat equation — 
the differential equation governing the diffusion of heat. Fourier's key insight was 
to decompose a periodic function x(t) = x(t + T) into an infinite sum of sines 
and cosines of periods T/k, k € Z + . Since these sine and cosine components are 
eigenfunctions of the heat equation, the solution of the problem is simplified: one 
can analyze the differential equation for each component separately and combine the 
intermediate results, thanks to the linearity of the system. Fourier's decomposition 
earned him a coveted prize from the French Academy of Sciences, but with a mention 
that his work lacked rigor. Indeed, the question of which functions admit a Fourier 
decomposition is a deep one, and it took many years to settle. Fourier's work is at 
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the heart of the present book — for both its strengths and its weaknesses. 

Signal Representations The idea of a decomposition and a possible modification in 
the decomposed state leads to signal representations, where signals can be sequences 
(discrete domain) or functions (continuous domain). Similarly to what Fourier did, 
where he used sines and cosines for decomposition, we can imagine using other 
functions with particular properties. Call these basis vectors and denote them by 
tpk, k G Z. Then 

feez 
is called an expansion of x in terms of {fk}kez- 

Orthonormal Bases When the basis vectors form an orthonormal set, that is, 

((p k , (pi) = 6 k -i, 

the coefficients Xk are obtained from the function x and the basis vectors <fk through 
an inner product 

X k = (x, (p k ). (0.2) 

For example, Fourier's construction of a series representation for periodic func- 
tions with period T = 1 can be written as 

x(t) = Y, X ke j27rk \ (0.3a) 

fcez 

where 

X k = f x(t)e-^ kt dt. (0.3b) 

jo 

We can define basis vectors tp k , k G Z, on the interval [0, 1), as 

<p*(t) = e^ k \ 0<t<l, (0.4) 

and the Fourier series coefficients as 

X k = (z,<p k ) = f x(t)<p* k (t)dt = f x(t)e-^ kt dt, 
Jo Jo 

exactly the same as ( 10. 3b) . The basis vectors form an orthonormal set (the first 
few are shown in Figure 10.1] ) : 

(<Pk,<Pi) = f e^ kt e-^ u dt = 4-,- (0.5) 

Jo 

While the Fourier series is certainly a key orthonormal basis with many out- 
standing properties, do other bases exist, and what are their properties? Early in 
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(a) ip (t) - 1 



0>)<pi(t) = e 



j2irt 



(c) tp 2 (t) -e 



jiixt 



Figure 0.1: The first three Fourier series basis vectors on the interval [0, 1). Real parts 
are shown with solid lines and imaginary parts are shown with dashed lines. 
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(a) Vo,o(i) 



(b) ^-1,1 (t) 



(C) V-2,l(*) 



Figure 0.2: Example Haar basis functions for the interval [0, 1). The prototype function 
is ij){t) = tpo,o(t)- 



the 20th century, Alfred Haar proposed a basis which looks quite different from 
Fourier's. It is based on a function ip(t) defined as 



1, for < t < 1/2; 
ip{t) = { -1, for 1/2 < t< 1; 
0, otherwise. 



(0.6) 



For the interval [0, 1), we can build an orthonormal system by scaling ip(t) by powers 
of 2, and then shifting the scaled versions appropriately, yielding 



ipm,n(t) 



-ra/2 



i, 



t - n2 r ' 



(0.7) 



with m e {0, —1, —2, . . .} and n £ {0, 1, . . . , 2 _m — 1} (a few are shown in Figure [072] ). 
It is quite clear from the figure that the various basis functions are indeed orthogonal 
to each other, as they either do not overlap, or when they do, one changes sign 
over the constant span of the other. We will spend a considerable amount of time 
studying this system in Part II of the book. 

While the system ( 10.7) is certainly orthonormal, it cannot be a basis; for 
example, on [0, 1) there would be no way to reconstruct a constant 1. We remedy 
that by adding the function 



<Po(t) 



1, for < t < 1; 
0, otherwise, 
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(a) Series with 9, 65 and 513 terms. 



(b) Detail of (a). 



Figure 0.3: Gibbs phenomenon for the Fourier series (|0.3a|) with 9 (black), 65 (gray) 
and 513 (red) terms of a square wave (Haar basis function ipo,o(t) from Figure [072T a))- 



into the mix, yielding an orthonormal basis for the interval [0, 1). This is a very 
different basis from the Fourier one; for example, instead of being infinitely differ- 
entiable, none of the ip m ,nS is even continuous. We can now define a basis as in 



flE3a)-fl(L3b) 



where 



-oo 2" 



x(t) = (x, ifo) <p (t) + ^ ^J X m ,n'tpm,n(t), 



171—0 n— 



A» 



X(t) 1pm,n(t) (It. 



(0.8a) 



(0.8b) 



It is natural to ask: Which basis is better? Such a question does not have a 
simple answer, and the answer will depend on the class of functions or sequences we 
wish to represent, as well as our goals in the representation. Furthermore, we will 
have to be careful in describing what we mean by equality in an expansion such as 
( 10.3a) ; otherwise we could be mislead the same way Fourier was. 

Approximation One way to assess the quality of a basis is to see how well it 
can approximate a given function with a finite number of terms. History is again 
enlightening. Fourier series became such a useful tool during the 19th century 
that researchers built elaborate mechanical devices to compute a function based on 
Fourier series coefficients. They built analog computers, based on harmonically- 
related rotating wheels, where amplitudes of Fourier coefficients could be set and 
the sum computed. One such machine, the Harmonic Integrator, was designed by 
the physicists Michelson and Stratton, and could compute a series with 80 terms. 
To the designers' dismay, the synthesis of a square wave from its Fourier series led to 
oscillations around the discontinuity that would not go away even as they increased 
the number of terms; they concluded that a mechanical problem was at fault. Not 
until 1899, when Gibbs proved that Fourier series of discontinuous functions cannot 
converge uniformly, was this myth dispelled. The phenomenon was termed Gibbs 
phenomenon, referring to the oscillations appearing around the discontinuity when 
using a finite number of terms (see Figure [0731 for an example). 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



From Rainbows to Spectra 



-i 



(a) Series with 8, 64 and 512 terms. 



(b) Detail of (a). 



Figure 0.4: Approximation of a square wave with a Haar basis using the first 8 (black), 
64 (gray) and 512 (red) coefficients, with m = 0, -1, -2 and n = 0, . .. , 2" m - 1. The 
discontinuity is at the irrational point l/v2- 



(a) Series with 8 and 15 terms. 



(b) Detail of (a) 



Figure 0.5: Approximation of a square wave with a Haar basis using 8 (black) and 
15 (red) largest-magnitude coefficients (in the neighborhood of the discontinuity). The 
15-term approximation is visually indistinguishable from the target function. 



So what would the Haar basis provide in this case? Surely, it seems more 
appropriate for a square wave. Unfortunately, taking the first 2~ m coefficients in 
the natural ordering (the coefficient corresponding to the function tp(t) plus 2~ m — 1 
coefficients corresponding to each scale m = 0, —1, — 2, . . . ) leads to a similarly poor 
performance, shown in Figure |0.4j 

However, changing slightly the approximation procedure makes a big differ- 
ence. By retaining the largest coefficients in absolute value instead of simply keeping 
a fixed set of coefficients, the approximation quality changes drastically, as seen in 
Figure 10.51 Compare Figures 10.31 and 10.41 where the Fourier approximations have 
5, 33, and 255 nonzero terms to the similar-quality Haar approximations with only 
4, 7, and 10 nonzero terms. 

Through this comparison, we have illustrated how the quality of a basis for 
approximation can depend on the method of approximation. Retaining a predefined 
set of coefficients, as in the Fourier example case (Figure 10.31 ) or the first Haar 
example (Figure 10.41 ) is called linear approximation. Retaining an adaptive set of 
coefficients instead, as in the second Haar example (Figure 10.51) , is called nonlinear 
approximation and leads to a superior approximation quality. 
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The central theme of the book is the design of expansions with certain features. 
While not the only criterion used to compare expansions, approximation quality 
arises repeatedly, and we will see that it is closely related to the central signal 
processing tasks of sampling, filtering, estimation and compression. 
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Part I: Foundations of 
Signals and Systems 



The first part of the book defines classes of signals and the basics of representing 
these signals. This lays the foundation for the second part of the book, in which 
representations with specific desirable properties are developed and applied. 

Chapter \Tf From Euclid to Hilbert, introduces the basic machinery of 
Hilbert spaces. These are vector spaces endowed with operations that induce intu- 
itive geometric properties. In this general setting, we develop the notion of signal 
representations, which are essentially coordinate systems for the vector space. When 
a representation is complete and not redundant, it provides a basis for the space; 
when it is complete but redundant, it provides a frame for the space. A key virtue 
for a basis is orthonormality; its counterpart for a frame is tightness. 

Chapters \2\ and \3\ narrow our attention to sequence and function spaces that 
are common in practice while introducing the concept of time, leading to an in- 
herent ordering not necessarily present in a general Hilbert space. In Chapter [2J 
Sequences and Discrete-Time Systems, a vector is a sequence that depends on 
discrete time, and an important class of linear operators on these vectors are those 
that are invariant to time shifts. These operators lead naturally to signal represen- 
tations using the discrete-time Fourier transform and, for finite-length sequences, 
the discrete Fourier transform. 

Chapter [3j Functions and Continuous-Time Systems, parallels Chap- 
ter [2j A vector is now a function that depends on continuous time, and an important 
class of linear operators on these vectors are again those that are invariant to time 
shifts. These operators lead naturally to signal representations using the Fourier 
transform and, for circularly-extended finite-length functions, or, periodic functions, 
the Fourier series. The four Fourier representations from these two chapters exem- 
plify the diagonalization of linear, shift-invariant operators, or convolutions, in the 
various domains. 

Chapter [4j Sampling and Interpolation, makes fundamental connec- 
tions between Chapters \2\ and [3J Associating a discrete-time sequence to a given 
continuous-time function is sampling and the converse is interpolation, central con- 
cepts in signal processing as the world is essentially continuous, while digital com- 
putations are made on sequences. 

Chapter [5j Approximation and Compression, introduces many types 
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of approximations that are central in making computationally-practical tools. Ap- 
proximation by polynomials and by truncations of series expansions are studied 
along with the basic principles of compression. 

Chapter [6j Time— Frequency Localization, introduces time, frequency, 
scale, and resolution properties of individual vectors, before we construct sets of 
vectors with which to represent signals in subsequent chapters. These properties 
build our intuition for what might or might not be possible as properties of rep- 
resentations. In particular, time and frequency localization lead to the concept of 
a time-frequency plane, where essential differences between Fourier techniques and 
wavelet techniques become evident: (1) Fourier techniques use vectors with equal 
spacing in frequency while wavelet techniques do not; and (2) Fourier techniques 
use vectors at equal scale while wavelet techniques use geometrically-spaced scales. 
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Chapter 1 

From Euclid to Hilbert 



Contents 
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l.C Elements of Probability |l44| 
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Further Reading 152 

Exercises with Solutions Il53| 

Exercises Il57| 

We start our journey into the world of Fourier and wavelets with different 
backgrounds and perspectives. This chapter aims to establish a common language, 
develop the foundations for our study, and begin to draw out key themes. 

There will be more formal definitions in this chapter than in any other, to 
approach the ideal of a self-contained treatment. However, we must assume some 
background in common: On the one hand, we expect the reader to be familiar 
with linear algebra at the level of (1401 Ch. 1-5] and probability at the level of (111 
Ch. 1—4]. (The textbooks we have cited are just examples; nothing unique to those 
books is necessary.) On the other hand, we are not assuming prior knowledge of 
general vector space abstractions or mathematical analysis beyond basic calculus; 
we develop these topics here to extend geometric intuition from ordinary Euclidean 
space to spaces of sequences and functions. For more details on abstract vector 
spaces, we recommend books by Kreyszig [93], Luenberger [98], and Young [176] . 
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10 Chapter 1. From Euclid to Hilbert 

1.1 Introduction 

This section introduces many topics of the chapter through the familiar setting of 
the real plane. In the more general treatment of subsequent sections, the intuition 
we have developed through years of dealing with the Euclidean spaces around us 
(R 2 and R 3 ), will generalize to some not-so-familiar spaces. Readers comfortable 
with vector spaces, inner products, norms, projections, and bases, may skip this 
section; otherwise, this will be a gentle introduction into Euclid's world. 

Real Plane as a Vector Space 

Let us start with a look at the familiar setting of R 2 , that is, real vectors with two 
coordinates. We adopt the convention of vectors being columns and often write 
them compactly as transposes of rows, such as x = [xq x\\ . The first entry is 
the horizontal component and the second entry is the vertical component. 

Adding two vectors in the plane produces a third one also in the plane; mul- 
tiplying a vector by a real scalar produces a second vector also in the plane. These 
two ingrained facts make the real plane be a vector space. 

Inner Product and Norm 

The inner product of vectors x = [xq xi\ and y = [j/o Vi\ in the real plane is 

(x, y) = xovo + xiyi. (1.1) 

Other names for inner product are scalar product and dot product. The inner prod- 
uct of a vector with itself is simply 

\X 1 Xj Xq "t" X-^j 

a nonnegative quantity that is zero when xq = x\ = 0. The norm of a vector x is 



llxll = V^x). (1.2) 

While the norm is sometimes called the length, we avoid this usage because length 
can also refer to the number of components in a vector. A vector with norm 1 is 
called a unit vector. 

In ( jl.ip , the inner product computation depends on the choice of coordinate 
axes. Let us now derive an expression in which the coordinates disappear. Consider 
x and y as shown in Figure 1.11 Define the angle between x and the positive 
horizontal axis as 6 X (measured counterclockwise), and define 6 y similarly. Using a 
little algebra and trigonometry, we get 

(x, y) = x y + xi y x 

= (||z|| cos^XIMI cos^) + (Hill sin^XIMI 8in0 tf ) 
= ||x||||2/||(cos0 x cos02, + sin 6 X sin 9 y ) 

= N||||/|c08(J,-fl,). (1.3) 
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Figure 1.1: A pair of vectors in 



Thus, the inner product of the two vectors is the product of their norms and the 
cosine of the angle 6 = 6 X — 9 y between them. 

The inner product measures both the norms of the vectors and the similarity 
of their orientations. For fixed vector norms, the greater the inner product, the 
closer the vectors are in orientation. The orientations are closest when the vectors 
are colinear and pointing in the same direction, that is, when = 0; they are the 
farthest when the vectors are antiparallel, that is, when 6 = ir. When (x, y) = 0, 
the vectors are called orthogonal or perpendicular. From ( 11.3) , we see that (x, y) 
is zero only when the norm of one vector is zero (meaning one of the vectors is 
the vector [0 0] ) or the cosine of the angle between them is zero (8 = ±7r/2). 
So at least in the latter case, this is consistent with the conventional concept of 
perpendicularity. 

The distance between two vectors is defined as the norm of their difference: 



d(x,y) = \\x-y\\ = \/{x-y, x - y) = ^{x Q - y Q ) 2 + (xi - yi) 2 



(1.4) 



Subspaces and Projections 

A line through the origin is the simplest case of a subspace, and projection to a 
subspace is intimately related to inner products. 

Starting with a vector x and applying an orthogonal projection operator onto 
some subspace results in the vector x closest to x (among all vectors in the sub- 
space). The connection to orthogonality is that the difference between the vector 
and its orthogonal projection x — x is orthogonal to every vector in the subspace. 
Orthogonal projection is illustrated in Figure [L2l (a) . The subspace S is formed by 
the scalar multiples of the vector ip, and three orthogonal projections onto S are 
shown. As depicted, the action of the operator is like looking at the shadow that 
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(a) Orthogonal projections onto S. 



(b) Oblique projections onto S. 



Figure 1.2: Examples of projections onto the subspace S specified by the unit vector 
tp. 



the input vector casts on S when light rays are orthogonal to S. This operation 
is linear, meaning that the orthogonal projection of x + y equals the sum of the 
orthogonal projections of x and y. Also, the orthogonal projection operator leaves 
vectors in S unchanged. 

Given a unit vector tp, the orthogonal projection onto the subspace specified 
by tp is x = (x, p)tp. This can also be written as 



~ (a) / v 

x = [X, tpjtp 



\X\\\\(p\\ COS V)tfi 



\x\\ cos6>) <p, 



(1.5) 



where (a) uses ||<^|| = 1, and 9 is the angle measured counterclockwise from tp to x, 
as marked in Figure [L2T a) . When tp is not of unit norm, the orthogonal projection 
onto the subspace specified by tp is 



IMI IMP \m\ 



(x, ip)tp, (1.6) 



where (a) expresses the orthogonal projection using the unit vector (y/||y||), and 
(b) uses ( [L3i 

Projection is more general than orthogonal projection; for example, Fig- 
ure 1.2( b) illustrates oblique projection. The operator is still linear and vectors 
in the subspace are still left unchanged; however, the difference (x — x) is not or- 
thogonal to S anymore. 



Bases and Coordinates 

We defined the real plane as a vector space using coordinates: the first coordinate is 
the signed distance as measured from left to right, and the second coordinate is the 
signed distance as measured from bottom to top. In doing so, we implicitly used 
the standard basis eg = [l 0] , ei = [0 l] , which is a particular orthonormal 
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basis of K 2 . Expressing vectors in a variety of bases is central to our study, and 
vectors' coordinates will differ depending on the choice of basis. 

Orthonormal Bases Vectors eo = [l 0] and e\ = [0 l] constituting the 
standard basis are depicted in Figure [1731 (a). They are orthogonal and of unit norm 
and are thus called orthonormal. We have been using this basis implicitly in that 



Xq 



+ Xl 



x e + X!ei (1.7) 



is an expansion of x with respect to the basis {eo, ei}. For this basis, it is obvious 
that an expansion exists for any x because the coefficients of the expansion xq and 
xi are simply "read off" of x. 

The general condition for {(fo, ifi} to be an orthonormal basis is 

(ipi, <p k ) = 5i-k for i, k G {0, 1}, (1.8) 

where <$,_& is a convenient shorthand defined as 2 ! 

_ / 1, for i = k; , , 

l - k ~ \ 0, otherwise. { J) 

From the i ^ k cases, the basis vectors are orthogonal to each other; from the 
i = k cases, they are of unit norm. With any orthonormal basis {(fo, <Pi}, one can 
uniquely find the coefficients of the expansion 

x = a ip + aiipi 

simply through the inner products 

a = (x, ip ) and ct\ = (x, (fi). 

The resulting coefficients satisfy 

|a | 2 + K| 2 = N| 2 (1.10) 

by the Pythagorean theorem, because ceo and ai form sides of a right triangle 
with hypotenuse of length ||a;|| (Figure [1731 (a)). The equality l ll.lOj ) is an example 
of Parseval's equality 3 ] and is related to Bessel's inequality; these will be formally 
introduced in Section 1.5.21 

Biorthogonal Pairs of Bases Expansions like (1.7) do not necessarily need {ipo, ipi} 
to be orthonormal. As an example, consider the problem of representing an arbi- 
trary vector x = [xq xu as an expansion ao^o + <*i<Pi with respect to ipo = 
[l 0] and ipi = [2 l] (see Figure [1731 (b)). This is not a trivial exercise such 



2 8 n is called the Kronecker delta sequence and is formally defined in Chapter \2\ (12. 7| ) . 
3 What we call Parseval's equality in this book is sometimes called Plancherel's equality as well. 
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(a) Expansion in an 
orthonormal basis. 



(b) Expansion in a 
nonorthogonal basis. 



(c) Basis {<po,ipi} and 
its dual {(poj'fii}- 



Figure 1.3: Expansions in R . 

as the one of expanding with the standard basis, but we can still come up with an 
intuitive procedure. 

Since tpo has no vertical component, we should use ip\ to match the vertical 
component of x, yielding a\ = X\. (This is illustrated with the diagonal dashed 
line in Figure [L3] (b).) Then, we need ao = xo — hx\ for the horizontal component 
to be correct. We can express what we have just done with inner products as 



O'O 



(x, tpo) and ai = (x, tpi), 



where 

<?o = t 1 ~\] and 9i = [0 l] , 
with vectors <po and <fii as shown in Figure [1731 (c). We have thus just derived an 
instance of the expansion formula 



oWo + asitpi 



(x, <Po)(fio + (%, <pi)<pi, 



(1.11) 



where {^0,^1} is the basis dual to the basis {cpo,cpi}, and the two bases form a 
biorthogonal pair of bases. For any basis, the dual basis is unique. The defining 
characteristic for a biorthogonal pair is 



(<Pi, fk) 



(1.12) 



You can check that this is satisfied in our example and that any orthonormal basis 
is its own dual. Clearly, designing a biorthogonal basis pair has more degrees of 
freedom than designing an orthonormal basis. The disadvantage is that ( 11.10) does 
not hold, and furthermore, computations can become numerically unstable if (po 
and if i are too close to colinear. 

An expansion like ( jl. lift is often termed a change of basis, since it expresses x 
with respect to {</?0i Vi}i rather than in the standard basis {eo, ei}. In other words, 
the coefficients (ao,a±) are the coordinates of a: in this new basis {ipo,<pi}. 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



1.1. Introduction 



15 



1, 


1 pi 

Po 


Sp* 


1 




(a) Standard basis plus a vector. 



(b) Tight frame. 



Figure 1.4: Illustrations of overcomplete sets of vectors (frames). 



Frames The signal expansion ( 11.11) has the minimum possible number of terms 
to work for every x G K 2 — two terms because the dimension of the space is two. It 
can also be useful to have an expansion of the form 



X = (x, ip Q )ipo + (x, Pl)pi + (x, p 2 )P2- 



(1.13) 



Here, an expansion will exist as long as {po, pi, ^2} are not colinear. Then, even 
after the set {yoiVi;^} is fixed, there are infinitely many dual sets {^0^1)^2} 
such that ( ] 1.131 ) holds for all x € R 2 . Such redundant sets are called frames and 
their (nonunique) dual sets dual frames. This flexibility can be used in various ways. 
For example, setting a component of pi to zero could save a multiplication and an 
addition in computing an expansion, or the dual, which we said was not unique, 
could be chosen to make the coefficients as small as possible. 

As an example, let us start with the standard basis {<po = ^Oj Pi = ei}, add 
a vector <p2 = —eo ~ ei to it, and see what happens (see Figure fT~4T a)): 



pa 



Pi 



P"2 



(1.14) 



As there are now three vectors in M 2 , they are linearly dependent; indeed, ip% = 
—po — Pi- Moreover, these three vectors must be able to represent every vector 
in M 2 since each two-element subset is able to do so. To show that, we use the 
expansion x = (x, po)po + (x, Pi)pi and add a zero to it to read: 

x = (x, po)po + (x, pi)pi + ((x, ipi) - (x, pi))p a + ({x, pi) - (x, pi))pi ■ 



= 
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We now rearrange it slightly: 



(x, (tp + ipi))ip + (x, 2tpi)pi + (x, tpi)(-ip - fi) 



E 

k=0 



(x, ip k )<Pk, 



with ifQ = ipo + ipi, ipi = 2tpi, if 2 = <fi- This expansion is exactly of the form 
( 11.130 and is reminiscent of the one for biorthogonal pairs of bases, which we have 
seen earlier, except that the vectors involved in the expansion are now linearly 
dependent. This shows that indeed, we can expand any x G R 2 in terms of the 
frame {fo, tpi, ^2} and one of its possible dual frames {ifo, <fii, ^2}- 

Can we now get a frame to somehow mimic an orthonormal basis? Consider: 



<Pq 







Vi 



1 

v/2 



: r2 



1 

t 
V2. 



(1.15) 



shown in Figure 11.4( b). By expanding an arbitrary x = \xq X\\ , we can verify 
that x = ^2 k=0 {x, ifklfk holds for any x. The expansion looks like the orthonormal 
basis one, where the same set of vectors plays both roles (inside the inner product 
and outside). The norm is preserved similarly to what happens with orthonormal 
bases Q^fc=n K 2 -' 'P^l 2 = H^H 2 )* except that the norms of the frame vectors are not 
1, but rather y 2/3. A frame with this property is called a tight frame. We could 
have renormalized the frame vectors by y3/2 to make them unit-norm vectors, in 

which case 5^fe=o K 2 "' Vk)^ = (3/2)||a;|| 2 , where 3/2 indicates the redundancy of the 
frame (we have 3/2 times more vectors than needed for an expansion in K 2 ). 

Matrix View of Bases and Frames An expansion with a basis or frame involves 
operations that can be expressed conveniently with matrices. 

Take the biorthogonal basis expansion formula ( 11 . 1 1 1 ) . The coefficients in the 
expansion are the inner products 





a = (x, ip ) = 


<Ax) ^0 + V01 Xi 




a 1 = (x, tpi) = 


ipio x + (fu x\ 


where ip = [tp o 
vector product 


ipoi] and tpi = [ipio 


^11] . Rewriti 



Rewrite the above as a matrix- 



(x, tpp) 
(x, <Pi) 



Voo 

<P10 



V01 
Vll 



<I,T 



The matrix $ T with ip^ and <p± as rows is called the analysis operator, and left 
multiplying by it computes the expansion coefficients (ao, cei) in the basis {voj fi}- 
The reconstruction of x from (ao, ai) is through 

x = a (f + aiipi. 
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This can be written with a matrix-vector product as 



x 


= a 


'Poo 
.Voi. 


+ OL\ 


¥>io 

Vn_ 


= 



</?00 VlO 
</?01 Vn. 



Of) 
Oil 



where tpo = [tpoo tpoi] and ip\ = [ipio <pii\ ■ The matrix $ with (po and tp± 
as columns is called the synthesis operator, and left multiplying by it performs the 
reconstruction of x from (ao, <X\)- 

The matrix view makes it obvious that the expansion formula ( jl.llj) holds 
for any x G K 2 when $$ is the identity matrix. In other words, we must have 
$ _1 = $ T , which is equivalent to ( J1.12J ). The inverse exists whenever {ipo, ipi} is a 
basis, and inverting $ determines the dual basis {</?0j Vi}- 

In the case of an orthonormal basis, $ _1 = $ T , that is, the matrix-vector 
equations above hold with $ = $. 

The case of a 3-element frame is similar, with matrices $ and $ each having 
2 rows and 3 columns. The validity of the expansion ( 11.131 ) hinges on $ being a left 
inverse of $ T . In the example we saw earlier, 



and its dual frame 



$ 



$ 



1 -: 

o i -: 

1 

1 2 1 



(1.16a) 



(1.16b) 



Such a left inverse, $ T , is never unique; thus dual frames are not unique. For 
example, the following dual frame 



$ 



(1.16c) 



would work as well. 

Chapter Outline 

The next several sections follow the progression of topics in this brief introduc- 
tion: In Section 1.21 we formally introduce vector spaces and equip them with inner 
products and norms. We also give several examples of common vector spaces. In 
Section 1.31 we discuss the concept of completeness that turns an inner product 
space into a Hilbert space. More importantly, we define the central concept of or- 
thogonality and then introduce linear operators. We follow with approximations, 
projections and decompositions in Section 11.41 In Section 1.51 we define bases and 
frames. This step gives us the tools to analyze signals and to create approximate 
representations. Section 1.5.5 develops the matrix view of basis and frame expan- 
sions. Section 1.61 discusses a few algorithms pertaining to the material covered. 
The appendices review some elements of analysis and topology, linear algebra, as 
well as probability. 
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18 Chapter 1. From Euclid to Hilbert 

1.2 Vector Spaces 

Sets of mathematical objects can be highly abstract, and imposing the axioms 
of a normed vector space is amongst the simplest ways to induce useful structure. 
Furthermore, we will see that images, audio signals, and many other types of signals 
can be modeled and manipulated well using vector space models. This section 
introduces vector spaces formally, including inner products, norms, and metrics. 
We give pointers to reference texts in Further Reading. 



1.2.1 Definition and Properties 

A vector space is any set with objects, called vectors, that can be added and scaled 
while staying within the set. The formal definition of a vector space needs to specify 
the field of scalars and properties that are required of the vector addition and scalar 
multiplication operations. 



Definition 1.1 (Vector space) A vector space over a field of scalars C (or 
M) is a set of vectors, V , together with operations of vector addition and scalar 
multiplication. For any x, y, z in V and a, (i in C (or M), these operations must 
satisfy the following properties: 

(i) Commutativity: x + y = y + x. 

(ii) Associativity: (x + y) + z = x + (y + z) and (a(3)x = a{j3x). 
(iii) Distributivity: a(x + y) = ax + ay and (a + f3)x = ax + f3x. 

Furthermore, the following hold: 

(iv) Additive identity: There exists an element in V, such that x+0 = 0+x = x, 

for every 1 in V. 
(v) Additive inverse: For every x in V , there exists a unique element — x in V, 

such that x + (—x) = (—x) + x = 0. 
(vi) Multiplicative identity: For every x in V, 1 • x = x. 



We have used the bold to emphasize that the zero vector is different than the 
zero scalar. In later chapters we will drop this distinction. The definition of a 
vector space requires the field of scalars to be specified; we opted to carry real and 
complex numbers in parallel. This will be true for a number of other definitions in 
this chapter as well. We now discuss some common vector spaces. 



l N : Vector Space of Complex-Valued Finite-Dimensional Vectors 

C N = lx=[x xt ... x N „i] T i n 6C,fie{0,l,...,JV-l}}, (1.17a) 
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where the vector addition and scalar multiplication are defined componentwise, 

x + y = [xq x\ ... xw-i] + [j/o 2/1 ■ ■ • Vn-i] 
= [xo + Vo xi + yi ... xn-i+Vn-i] , 
ax = a \xq x\ ... ccjv-i] = [ctxo ax\ . . . czxn-i] ■ 

It is easy to verify that the six properties in Definition 11.11 hold; C N is thus a vector 
space (see also Solved Exercise I l.lj ). The definition of the standard Euclidean space, 
K w , follows similarly, except over R. 

C z : Vector Space of Complex-Valued Sequences over Z 

C z = (x=[... x-i \x^ X! ...] T x n eC,nez\, (1.17b) 



where the vector addition and scalar multiplication are defined componentwiseo 

C R : Vector Space of Complex-Valued Functions over K 

C R = {cc(t) x(t) GC,tet}, (1.17c) 

with the natural addition and scalar multiplication operations: 

(x + y)(t) = x(t)+y(t), (1.18a) 

(ax)(t) = ax(t). (1.18b) 

Other vector spaces of sequences and functions can be denoted similarly, for 
example, C N for complex- valued sequences indexed from 0, C R for complex- valued 
functions on the positive real line, O a ' b ! for complex-valued functions on the interval 
[a, 6], etc. 

The operations of vector addition and scalar multiplication seen above can be 
used to define many other vector spaces. For example, componentwise addition and 
multiplication can be used to define the vector space of matrices, while the natural 
operations of additions and scalar multiplication of functions can be used to define 
the vector space of polynomials: 

Example 1.1 Fix a positive integer N and consider the real- valued polynomials 
of degree at most (N — 1), x(t) = J2k=o a kt k - These form a vector space over K 
under the natural addition and multiplication operations. Since each polynomial 
is specified by its coefficients, polynomials combine exactly like vectors in R*. 



4 When writing infinite sequences as column vectors, the entry with index zero is boxed to serve 
as a reference point. We will do this also for infinite matrices. 
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Definition 1.2 (Subspace) A subset 5 of a vector space V is a subspace when 
it is closed under the operations of vector addition and scalar multiplication: 



(i) For all x and y in S, x + y is in S. 
(ii) For all x in S and a in C (or R), ax is in S. 



A subspace S is itself a vector space over the same field of scalars as V and with 
the same vector addition and scalar multiplication operations as V. 

Example 1.2 (Subspaces) 

(i) Let 1 be a vector in a vector space V. The set of vectors of the form ax 

with a G C is a subspace. 
(ii) In the vector space of complex- valued sequences over Z, the sequences that 
are zero outside of {2, 3, 4, 5} form a subspace. The same can be said with 
{2, 3, 4, 5} replaced by any finite or infinite subset of the domain Z. 

(iii) In the vector space of real-valued functions on R, the functions that are 

2 ' "' ' 2 



constant on intervals [k — h,k+h), k G Z, form a subspace. This is because 



the sum of two functions each of which is constant on [k — i, k + i) is also a 
function constant on [k — 5 , k+ 5 ) , while a function constant on [A;— 5 , k+ 1) 
multiplied by a scalar is also a function constant on [k — i, fc + 5). 
(iv) In the vector space of real- valued functions on the interval [—5, 5] under 
the natural operations of addition and scalar multiplication ( jl.181 ) , the set 
of odd functions, 

5 odd = {x(t) I x(t) = -x(-t) for all t e [-|, |]} , (1.19a) 

is a subspace. Similarly, the set of even functions, 

S eV e„ = {x(t) \ x(t) = x{-t) for all t G [-|, |]} , (1.19b) 

is also a subspace. Either is easily checked as the sum of two odd (even) 
functions yields an odd (even) function; scalar multiplication of an odd 
(even) function yields again an odd (even) function. 

Definition 1.3 (Affine subspace) A subset T of a vector space V is an affine 
subspace when there exist a vector x G V and a subspace S C V such that any 
t G T can be written as x + s for some s G S. 



Beware that an affine subspace is not necessarily a subspace; it is a subspace if 
and only if it includes 0. Affine subspaces generalize the concept of a plane in 
Euclidean geometry; subspaces correspond just to planes that include the origin. 
Affine subspaces are convex sets, meaning that if vectors x and y are in the set, so 
is any vector 9x + (1 — 6)y for 6 G [0, 1]. 



a3.0 [October 2011] CC by-nc-nd Comments to book-errata@FourierAndWavelets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



1.2. Vector Spaces 21 

Example 1.3 (Affine subspaces) 

(i) Let x and y be vectors in a vector space V. The set of vectors of the form 

X + ay with a € C is an affine subspace. 
(ii) In the vector space of complex-valued sequences over Z, the sequences that 
equal 1 outside of {2, 3, 4, 5} form an affine subspace. 

The definition of a subspace is suggestive of one way in which subspaces arise — 
by combining a finite number of vectors in V . A set of all finite linear combinations 
of elements in that set is a span, which we now define. 



Definition 1.4 (Span) The span of a set of vectors S is the set of all finite linear 


combinations of vectors in S: 


span(S') = < ^ ®kPk 


Q fc e C (or R), <p k e S and N e N \ . 


I fc=0 ) 



Note that a span is always a subspace and that the sum has a finite number of 
terms even if the set S is infinite. 

Example 1.4 Proper subspaces (those smaller than the entire space) arise in 
linear algebra when one looks at matrix-vector products with rank-deficient ma- 
trices. Consider the vector space V = R and let S = {y = Ax \ x € 1 N }. Here, 
A are size-(7V x N) real-valued matrices A of rank M, with M < N . Applying 
the conditions in the definition of a subspace (Definition 11.2) to S, and using the 
properties of matrix multiplication, we verify that S is indeed an M-dimensional 
subspace of S. N . As per (1.206a) in Appendix ll.Bj this subspace is the span of 
the columns of A. 

Many different sets can have the same span, and it can be of fundamental 
interest to find the smallest set with a particular span. This leads to the dimension 
of a vector space, which depends on the concept of linear independence. 

Definition 1.5 (Linear independence) The set of vectors 

{(fio, ipi, ..., ifiN-i} is called linearly independent when J2k=o a k l fik = is 
true only if at = for all k. Otherwise, the set is linearly dependent. An infinite 
set of vectors is called linearly independent when every finite subset is linearly 
independent. 



Definition 1.6 (Dimension) A vector space V is said to have dimension N 
when it contains a linearly independent set with N elements and every set with 
N + 1 or more elements is linearly dependent. If no such finite N exists, the vector 
space is infinite dimensional. 
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1.2.2 Inner Product 

Our intuition from Euclidean spaces goes farther than just adding and multiplying. 
It has geometric notions of orientation and orthogonality as well as metric notions 
of norm and distance. In this and the next subsection, we extend these to our 
abstract spaces. 

As visualized in Figure 1.2[ an inner product is like a signed norm of an 
orthogonal projection of one vector onto a subspace spanned by another. It thus 
measures norm along with relative orientation. 



Definition 1.7 (Inner product) An inner product on a vector space V over 
C (or M) is a complex- valued (or real- valued) function (•, •) defined on V X V with 
the following properties for any x, y, z € V and bgC (or K.): 

(i) Distributivity: (x + y, z) = (x, z) + (y, z). 

(ii) Linearity in the first argument: {ax, y) = a(x, y). 

(iii) Hermitian symmetry: (x, y)* = (y, x). 

(iv) Positive definiteness: (x, x) > 0, and (x, x) = if and only if x = 0. 



Note that (ii) and (iii) imply {x, ay) = a*{x, y). Thus, along with being linear in 
the first argument, the inner product is conjugate-linear in the second argument^ 
Also note that the inner product being a number excludes the possibility of it being 
a divergent (infinite) quantity. Thus, an inner product on V must return a finite 
number for every pair of vectors in V . This constrains both the functional form of 
an inner product as well as the set of vectors to which it can be applied. 

Example 1.5 (Inner product) Consider the vector space C 2 . 

(i) (x, y) = xoj/g +5xiyl is a valid inner product; it satisfies all the conditions 

of Definition 1.71 
(ii) (x, y) = x^yQ+x^yi is not a valid inner product; it violates Definition [L7](ii) . 

For example, if x = y = [0 l] and a = j, (ax, y) = —j and a(x, y) = j, 
and thus, (ax, y) 7^ a(x, y). 
(iii) (x, y) = xoyQ is not a valid inner product; it violates Definition [L7liv) 
because x = [0 l] is nonzero yet yields (x, x) = 0. 



Standard Inner Product on C N The standard inner product on C^ is 

Af-l 



(z, y) = ^x n y* n = y* x, (1.20a) 



n=0 



5 We have used the convention that is dominant in mathematics; in physics, on the other hand, 
the inner product is defined to be linear in the second argument and conjugate-linear in the first. 
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where the second equality uses matrix-vector multiplication to express the sum, 
with vectors implicitly column vectors and * denoting the Hermitian transpose 
operation. While we will use this inner product frequently and without special 
mention, this is not the only valid inner product for C (or K ) (see Exercise 1 1.6,1 ). 

Standard Inner Product on C z The standard inner product on the vector space 
of complex- valued sequences over Z is 

(x, y ) = Y, x ^y*n = y* x > ( L20b ) 

raSZ 

where we are taking the unusual step of using matrix product notation with an 
infinite row vector y* and an infinite column vector x. As stated above, the sum 
must converge to a finite number for the inner product to be valid, restricting the 
set of vectors on which we can operate. 

Standard Inner Product on C K The standard inner product on the vector space 
of complex- valued functions over M. is 

/oo 
x(t)y*(t)dt. (1.20c) 

-oo 

We must be careful that the integral exists and is finite for the inner product to 
be valid, restricting the set of vectors on which we can operate. We restrict this 
set even further to those functions with a countable number of discontinuities, thus 
eliminating a number of subtle technical issues c\ 

Orthogonality 

An inner product endows a space with the geometric properties of orientation, for 
example perpendicularity and angles. In particular, an inner product being zero 
has special significance. 



Definition 1.8 (Orthogonality) 












(i) 


Vectors x and y are 


said to be orthogonal when 


{x 


,y) = 


0, written as 


x _L 


y- 


(ii) 


A set of vectors S 
such that x 7^ y. 


is called orthogonal when x 


± 


y for 


every x and 


V in 


s 


(iii) 


A set of vectors S is 
for every x in S. 


i called orthonormal when it 


is 


ortho; 


i;onal and (a;, 


x) = 


: 1 



6 For the inner product to be positive definite, as per Definition |1.7[(jv)j we must identify any 
function satisfying J^° \x(t)\ 2 dt = with 0. From this point on, we restrict our attention to 
Lebesgue-measurable functions, and all integrals should be seen as Lebesgue integrals. In other 
words, we exclude from consideration those functions that are not "well behaved" in the above 
sense. This restriction is not unduly stringent for any practical purpose. We follow the creed 
of R. W. Hamming [66]: ". . . if whether an airplane would fly or not depended on whether some 
function . . . was Lebesgue but not Riemann integrable, then I would not fly in it." 
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VT 



-l 

-V2 




(a) ipo(t) = 1 



(b) ipi(t) - \/2cos(27rt) (c) (f 2 (t) = \/2cos(47rt) 



Figure 1.5: Example functions from ( ]1.21j) 



(iv) A vector x is said to be orthogonal to a set of vectors S when lis for all 

s G S, written as x ± S. 
(v) Two sets Sq and S\ are said to be orthogonal when every vector sq in Sq is 

orthogonal to the set Si , written as So J_ Si. 
(vi) Given a subspace S of a vector space V, the orthogonal complement of S, 

denoted S , is the set {x G V \ x ± S}. 



Note that the set S is a subspace as well. Vectors in an orthonormal set {(fik}k& 
are linearly independent, since = ^ak^fk implies that = (^ak<Pk, <£k) = 
Y,OLk{<Pk, fi) = a % for any i. 

Example 1.6 (Orthogonality) Consider the set of vectors $ = {<Pk}k&i 
where 



<Po(t) = 1. 

ip k (t) = V2cos(2irkt), k = 1, 2, 



(1.21a) 
(1.21b) 



The functions tpo, (pi, and <pi are shown in Figure 1.51 Using inner product 
( |1.20cj) , we have the following properties: 

(i) For any k,m G Z + with k 7^ m, vectors tfk and ip m are orthogonal because 

,1/2 



(Vfci i'm) = 2 / cos(27r/ct) cos(27rmt) di 
-1/2 
1/2 

[cos(27r(fc + to)£) + cos(27r(A; — m)t)\ dt 
-1/2 



(a) 



1 
2^ 

0, 



1 



fc + m 



sin(27r(fc + m)t) 



1 



i k — m 



sin(27r(fc — m)t) 



where (a) follows from the trigonometric identity for the product of cosines. 
It is easy to check that for any k G Z + , the vectors ipo and ipk are orthogonal. 
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(ii) The set of vectors $ is orthogonal because, using (i) , (pk -L (p m for every pk 

and ip m in $ such that (pk 7^ <p m - 
(iii) The set of vectors $ is orthonormal because it is orthogonal, as we have 

just shown in (ii) , and for any fc = 1, 2, . . .: 



," 1/2 / / ,H 2 , («) f 1/2 1 + cosUnkt) , 

(pk, tp h ) - 2 / (cos(27rfct)) 2 dt = 2 / ^ i<ft 

-1/2 J -1/2 * 

1/2 



1/2 1 

H sin(47rfci) 

-1/2 47rfc 



-1/2 



1, 



where (a) follows from the double-angle formula for cosine. The inner prod- 
uct of ipo with itself is trivially 1. 
(iv) For any k G N, the vector (pk is orthogonal to the set of odd functions S^dd 
defined in ( ]1.19a[ ) because (pk is orthogonal to every s G S Q dd : 

,1/2 
((^ fc , s) = / V2cos(2irkt)s(t)dt 
J-1/2 

= V^l / cos(2nkt)s{t)dt+ cos(2n kt)s{t) dt J 

( = } v 7 ^ I - / cos(2nkt)s(-t)dt+ cos(2tt kt)s(t) dt j 

1/2 /.1/2 



(6) 

'o Jo 



\/2 - / cos(2n kr) s (t) dr + cos(2ir kt)s(t) dt 



where (a) follows from the definition of an odd function; and (b) from the 
change of variable r = —t and the fact that cosine is an even function, that 
is, cos(— 27rfcr) = cos(27t£;t). 
(v) The set $ is orthogonal to the set of odd functions S* dd because each vector 



in $ is orthogonal to S dd, using (iv) 



Inner Product Spaces 

A vector space equipped with an inner product from Definition 11.71 becomes an inner 
product space (sometimes also called a pre-Hilbert space). As we have mentioned, 
on C z and C R , we must exercise caution and choose the subspace for which the 
inner product is finite. 

1.2.3 Norm 

A norm is a function that assigns a length, or size, to a vector (analogously to the 
magnitude of a scalar). 
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Definition 1.9 (Norm) A norm on a vector space V over C (or 


K) is a 


real- 


valued function || ■ || denned on 


V with the following properties for 


any x, y 


e V 


and a £ C (or 


R) : 










(i) Positive 


definiteness: \\x\\ 


> 0, and ||x|| = 


= if and only if x = 


0. 




(ii) Positive 


scalability: \\ax\\ 


= \a\\\x\\. 








(iii) Triangle 


inequality: \\x + y 


II<W + NI, 


with equality if and only if y = 


- ax. 



Note that the same comments we made about the finiteness and validity of an inner 
product apply to the norm as well. 

In the above, the triangle inequality got its name because it has the following 
geometric interpretation: the length of any side of a triangle is smaller than or equal 
to the sum of the lengths of the other two sides; equality occurs only when two sides 
are colinear, that is, when the triangle degenerates into a line segment. For example, 
if V = C and the standard norm is used, the triangle inequality becomes: 

\x + y\ < \x\ + \y\ for any x,ye£. (1.22) 

An inner product can be used to define a norm, which we say is the norm 
induced by the inner product. The three inner products we have seen in ( 11.20) 
induce corresponding standard norms in C N , C z and C K , respectively. Not all 
norms are induced by inner products. We will see examples of this both here as 
well as in Section 1.2.41 

Example 1.7 (Norm) Consider the vector space C 2 . 

(i) \\x\\ = \xq\ 2 + 5|xi| 2 is a valid norm; it satisfies all the conditions of Defini- 
tion 11.91 It is induced by the inner product from Example l.^(i)[ 
(ii) ||a;|| = \xq\ + \xi\ is a valid norm. However, it is not induced by any inner 
product. 

(iii) ||x|| = |xo| is not a valid norm; it violates Definition ll.Sf(i)j because x = 
[0 l] is nonzero yet yields ||a;|| = 0. 

Standard Norm on C N The standard norm on C , induced by the inner product 

( fPOat is: 

1/2 

(1.23a) 



\\x\\ = JWx~) = ( £ \x n A 

\n=0 / 



This is also called the Euclidean norm and yields the conventional notion of length. 
Standard Norm on C z The standard norm on C z , induced by the inner product 

iixii = v^> = (eki 2 ) • ( L23b ) 



OOM.is: 
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Figure 1.6: Illustration of the parallelogram law. 



Standard Norm on C The standard norm on C , induced by the inner product 
flOOcl . is: 

/ />oo \ 1/2 

\\x\\ = y/(x, x) = U \x(t)\ 2 dt) . (1.23c) 

Properties of Norms Induced by an Inner Product 

The following facts hold in any inner product space. 

Cauchy-Schwarz Inequality This widely used inequality states that ill 

IMI < WW, (1-24) 

with equality if and only if x = ay for some scalar a. We will see an example of 
the Cauchy-Schwarz inequality shortly. 

Parallelogram Law This law generalizes from Euclidean geometry to an inner 
product space and states that (see Figure 1.61 for an illustration) : 

\\ X + yf + \\ X -yf = 2(W 2 + ||y|| 2 ). (1.25) 

Be careful that even though no inner product appears in the parallelogram law, it 
necessarily holds only in an inner product space. In fact, ( 1 1 . 2 5 [ ) is a necessary and 
sufficient condition for a norm to be induced by an inner product (see Exercise 1 1.10) . 



Pythagorean Theorem Like the parallelogram law, this theorem generalizes from 
Euclidean geometry to an inner product space. The statement learned in elementary 
school involves the sides of a right triangle. In its more general form the theorem 



7 The result for sums is due to Cauchy, while the result for integrals is due to Schwarz. Buni- 
akovsky seems to have published the result for integrals six years earlier than Schwarz; thus, the 
integral version is sometimes referred to as the Buniakovsky inequality. 
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states that! 8 ) 

x±y implies \\x + y\\ 2 = \\x\\ 2 + \\y\\ 2 . (1.26a) 

Among many possible proofs of the theorem, one follows from expanding (x + y, x + 
y) into four terms and noting that (x, y) = (y, x) = because of orthogonality. By 
induction, the Pythagorean theorem holds in a more general form for any countable 
set of orthogonal vectors: 

{ x k } fee K orthogonal implies HX^gjc^fcH = Sfcgjc \\ x k\\ 2 ■ (1.26b) 

Normed Vector Spaces 

A vector space equipped with a norm becomes a normed vector space. As with 
the inner product, we must exercise caution and choose the subspace for which the 
norm is finite. 



Metric 

Intuitively, the length of a vector can be thought of as the vector's distance from 
the origin. This extends naturally to a metric induced by a norm, or a distance. 



Definition 1.10 (Metric, or Distance) In a normed vector space, the met- 
ric, or distance between vectors x and y is the norm of their difference: 



d{x,y) = \\x-y\ 



Much as norms induced by inner products are a small fraction of all possible norms, 
metrics induced by norms are a small fraction of all possible metrics. In this book, 
we will have no need for more general concepts of metric; for the interested reader, 
Exercise 11.131 gives the axioms that a metric must satisfy and explores metrics that 
are not induced by norms. 



1.2.4 Standard Spaces 

We now discuss some standard vector spaces: first inner product spaces (they are 
also normed vector spaces as their inner products induce the corresponding norms) , 
followed by other normed vector spaces (those for which the norms are not induced 
by an inner product). 



8 The theorem was found on a Babylonian tablet circa 1900—1600 B.C., and it is not clear 
whether Pythagoras himself or one of his disciples stated and later proved the theorem. The first 
written proof and reference to the theorem are in Euclid's Elements [68] . 
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Standard Inner Product Spaces 

The first three spaces, C N , 1 2 {7L) and £ 2 (R), are the spaces most often used in this 
book! 9 ] For each, the inner product and norm have been defined already in ( 11.201) 
and ( 11.231 ) ; we repeat them for each space for easy reference. 

C^: Space of Complex- Valued Finite-Dimensional Vectors We repeat the ex- 
pressions for the inner product ( 11.20aj ) and the norm ( |1.23a.| ) it induces: 

JV-l /JV-1 

(x,y) = ^x n y*, 11*11 = El^l 2 ) • (!- 27 ) 

n=0 \ n=0 

The above norm is not the only norm possible on C^; in the next subsection, we 
will introduce p norms as possible alternatives. 

f 2 (Z): Space of Square-Summable Sequences This is the normed vector space of 
square-summable complex-valued sequences, and it uses the inner product fl 1.20b] ) 
and the norm ( |1.23b| ): 





( X ,y) = $>„ y ;, \\x\\ = EKI 2 • ( L28 ) 



This space is often referred to as the space of finite- energy sequences. 

By the Cauchy-Schwarz inequality Q 1.241 ), the finiteness of ||a;|| and \\y\\ for 
any x and y in £ 2 (Z) implies the inner product (a;, y) is finite, provided the sum in 
the inner product is well defined. A somewhat technical point is that the square- 
summability condition that determines which sequences are in ^ 2 (Z) also ensures 
that the sum in the inner product is indeed well defined; see Exercise 1.121 

£ 2 (R): Space of Square-lntegrable Functions This is the normed vector space 
of square-integrable complex- valued functions, and it uses the inner product ( ]1.20c| ) 
and the norm Q1.23c| ): 

/OO / roc \ 1/2 

x(t)y*(t)dt, \\x\\ = / \x(t)fdt) . (1.29) 

This space is often referred to as the space of finite- energy functions. According to 
Definition 1.61 this space is infinite dimensional; for example, {e _t , te~ l , t 2 e~ f , 
. . .} are linearly independent. As in the case of ^ 2 (Z), the inner product is always 
well defined. 

We can restrict the domain K to just an interval [a, b], in which case the space 
becomes C 2 ([a, b]), that is, the space of complex-valued square-integrable functions 



9 The reasoning behind naming £ 2 (Z) and £ 2 (BJ) will become clear in the section on standard 
normed vector spaces shortly. 
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on the interval [a, b]. The inner product and norm follow naturally from ( 11.29J ): 
,-b 



(x, y) 



x(t)y*(t)dt, 



IMI 



rb \ 1/2 

|2 



\x{t)Ydt 



In £ 2 ([a, &]), the Cauchy-Schwarz inequality ( 11.24) becomes 



x(t)y*(t)dt 



1/2 



1/2 



< 



\x(t)\ 2 dt 



\y(t)\ 2 dt 



(1.30) 



(1.31) 



with equality if and only if x(t) and y{t) are linearly dependent. By setting y(t) = 1 
and squaring both sides, we get another useful fact: 



x(t) dt 



< (b-a) \x(t)f dt 



(1.32) 



with equality if and only if x(t) is constant on [a, b]. 



C q ([a, b]): Spaces of Continuous Functions with q Continuous Derivatives For 

any finite a and 6, the space C(ja, b\) is defined as the space of complex- valued 
continuous functions over [a, b] with inner product and norm given in ( 11.30} ). The 
space of complex- valued continuous functions over [a, b] that are further restricted 
to have q continuous derivatives is denoted C q ([a, b]), so C([a, b]) = C°([a, b\). Each 
C q ([a, b\) space is a subspace of Cl a ' b '. As an example, the set of polynomial func- 
tions forms a subspace of C q {[a,b\) for any a,b g K and q E N. This is because 
the set is closed under the vector space operations and polynomials are infinitely 
differentiable. 

A C q ([a, b]) space is very similar to C 2 ([a, b]). The distinction is completeness 
(which C q ([a,b\) can lack), which is a defining characteristic of Hilbert spaces that 
we introduce in the next section. 



Spaces of Random Variables The set of complex random variables defined in 
some probabilistic model form a vector space over the complex numbers; all the 
properties required in Definition 1.1 are inherited from the complex numbers, with 
the constant as the additive identity. 

A useful inner product to define on this vector space is 



(x, y) = E[xy*]. 



(1.33) 



This clearly satisfies properties Definition [Q](i) (iii) , and also 

(x, x) = E[xx*] = E[|x| 2 ] >0. 

Only the second part of Definition |T7?Tiv) is subtle. It is indeed the case that 
E[xx*] = implies x = 0. This is because of the sense of equality for random 
variables reviewed in Appendix ll.Cl The norm induced by this inner product is 



c|| = x/M = v^ilxPT- 



(1.34) 
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This shows that if we restrict to random variables with finite second moment, 
E[ |x| 2 ] < 00, we have a normed vector space. 

When E[x] = 0, the norm ||x|| is the standard deviation of x. When E[y] = 
also, (x, y) is the covariance of x and y; the normalized inner product (x, y)/(||x|| ||y||) 
equals the correlation coefficient. 

Standard Normed Vector Spaces 

C N Spaces As we said earlier, we can define other norms on C^. For example, 
the p norm is defined as 



(1.35a) 



for jig [1, 00). Since the sum above has a finite number of terms, there is no doubt 
that the sums converge. Thus, we take as a vector space of interest the entire C^; 
note how this contrasts with some of the examples we see shortly (£ P (Z) spaces). 

For p = 1, this norm is called the taxicab norm or Manhattan norm because 
||x||i represents the driving distance from the origin to x following a rectilinear 
street grid. For p = 2, we get our usual Euclidean square norm from ( 11. 35a) , and 
only in that case is a p norm induced by an inner product. The natural extension 
of ( 11. 35a) to p = 00 (see Exercise 11.141) defines the 00 norm as: 

Hxlloo = max(|aro|,|ari|,...,|a;jv-i|)- (1.35b) 




Using ( 11. 35a) for p£ (0, 1) does not give a norm but can still be a useful quantity. 
The failure to satisfy the requirements of a norm and an interpretation of ( 11.35a) 
with p — > are explored in Exercise 1.151 

All norms on finite-dimensional spaces are equivalent in the sense that any 
two norms bound each other within constant factors (see Exercise 1.160 . This is a 
crude equivalence that leaves significant differences in which vectors are considered 
larger than others, and it does not extend to infinite-dimensional spaces. Figure [TT7I 
shows this pictorially by showing the sets of unit-norm vectors for different p norms. 
All vectors ending on the lines have a unit norm in the corresponding p norm. For 
example, with the usual Euclidean norm, unit-norm vectors fall on a circle; on the 
other hand, in 1 norm they fall on the diamond-shaped polygon. Note that only for 
p = 2 is the set of unit-norm vectors invariant to rotation of the coordinate system. 



£ P (Z) Spaces We can define other norms on C z as well (like we did for C N ). 
However, because the space is infinite, the choice of the norm and the requirement 
that it be finite restricts C z to a smaller set. For example, for p G [l,oo), the £ p 
norm is 

IMIp = few) • (l-36a) 
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Figure 1.7: Sets of unit-norm vectors for different p norms. From outside in: p — oo 
(black), p — 4 (dark gray), p — 2 (red), p — 1 (gray), and p — 1/2 (dashed). (The p = \ 
case is not a norm.) Vectors ending on the lines are of unit norm in the corresponding p 
norm. 



Analogously to ( ]1.35b| ). we extend this to the 



norm as 



MIc 



SUp|2V, 

nGZ 



'1.36b) 



We have already introduced the £ p norm for p = 2 in ( 11.28) ; only in that case is 
an £ p norm induced by an inner product. We can now define the spaces associated 
with the £ p norms: 



Definition 1.11 (£ P {Z)) For any p e [l,oo], the normed vector space £ P (Z) is 
the subspace of C z consisting of vectors with finite t p norm. 



Solved Exercise 11.21 shows that the subset of C z with vectors of finite £ P (Z) norm 
form a subspace. Since £ P (Z) is defined as a subspace of C z , it inherits the operations 
of vector addition and scalar multiplication from C z . The norm is the £ p norm 
tL36). 

Example 1.8 Consider the sequence x given by 

0, n<0; 

l/n a , n>0, 

for some real number a > 0. Let us determine which of the spaces £ p (7,) contain 
x. To check whether x is in £ p (Wj) for p € [1, oo), we need to determine whether 



Ml£ 



E 

n=l 



oo 1 



Z^ 



n pa 
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converges. The necessary and sufficient condition for convergence is pa > 1, so 
we conclude that x € ^ P (Z) for p > l/a and a > 0. For a = 0, the above does 
not converge. For x € £°°(Z), x must be bounded, which occurs for all a > 0. 

This example illustrates a simple inclusion property proven as Exercise 11.171 

p < q implies £ P {Z) C £ q (Z). (1.37) 

This can loosely be visualized with Figure 1.71 the larger the value of p, the larger 
the set of vectors with a given norm. In particular, £ (Z) C ^ 2 (Z). In other words, 
if a sequence has a finite I 1 norm, then it has a finite £ 2 norm. Beware that the 
opposite is not true; if a sequence has a finite £ 2 norm, it does not follow that is has 
a finite £ norm. 

Example 1.9 Consider the sequence x n = 1/n, for n € Z + : 



71=1 



1 2 



1 v^ 

— 7r converges, while ||x||i 



; ' ; - Z. 

71=1 



diverges. 



Thus, x e £ 2 {Z) and x £ l {1) 



£ P (M) Spaces Like for sequences, we can define other norms on C R as well. Again, 
because the space is infinite, the choice of the norm and the requirement that it be 
finite restricts C R to a smaller set. For example, for p £ [l,oo), the L p norm is 

/ roo \ i/p 

IMIp = (J_ \*<t)\ p <tt) . (1.38a) 

The extension to p = 00 leads to the C°° norm as 

Halloo = esssup |x(t)|. (1.38b) 

tgK 



We have already introduced the C p norm for p = 2 in ( 11.29) ; only in that case is 
an C p norm induced by an inner product. We can now define the spaces associated 
with the CP norms: 



Definition 1.12 (/7(R)) For any p e [1, 00], the normed vector space C P (R) is 
the subspace of C M consisting of vectors with finite HP norm. 



Since £ P (IR) is defined as a subspace of C R , it inherits the operations of vector 
addition and scalar multiplication from C R . The norm is the C p norm ( 11.38) . 

One can also use the same norms on different domains; for example, we can 
define the domain to be [a, b] and use a finite CP norm on it to yield the space 
C p {[a,b}) (like we did for £ 2 (R) and £ 2 {[a,b}). We will use this in Chapters \2\ and 
\3\ where we will often be operating on C p ([—tt,tt]). 
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1.3 Hilbert Spaces 

We are going to do most of our work in Hilbert spaces. These are inner product 
spaces seen in the previous section, with the additional requirement of completeness. 
Completeness is somewhat technical, and for a basic understanding it will suffice 
to have faith that we work in vector spaces of sequences and functions in which 
convergence makes sense. We will furthermore be mostly concerned with separable 
Hilbert spaces because these spaces have countable bases. 

1.3.1 Convergence 

Convergence of sequences of numbers should be a familiar concept; it is reviewed 
in Appendix 1.A.21 Convergence of a sequence of vectors requires a metric, and we 
limit our attention to metrics induced by norms. 



Definition 1.13 (Convergent sequence of vectors) A sequence of vec- 
tors loi ii, ... in a normed vector space V is said to converge to v £ V when 
limfc—.oo \\v — Xk\\ = 0. In other words: Given any e > 0, there exists a K e such 
that 

\\v — Xk\\ < e for all k > K e . 



The elements of a convergent sequence eventually stay arbitrarily close to v. Not 
only does the definition of convergence of sequences of vectors use a norm, but 
whether a sequence converges can depend on the choice of norm. This is illustrated 
in the following example: 

Example 1.10 (Convergence in different norms) 
(i) For each k € Z + , let 



x k {t) 



1, for t E [0, 1/fc]; 
0, otherwise. 



Also, let v(t) = for all t. For any p£ [1, 00), using the expression for the 
£P norm, ( fOSaj ), 

\\v-x k \\ p = (J°° \v(t) - a*(t)|* dt) * = (I) ^ 0> 

so x\, X2, ■•• converges to v. For p = 00, using the expression for the C°° 
norm, ( |1.38bj ), \\v — Xk\\oo = 1 for all k, so the sequence does not converge 
to v under the C°° norm. 

(ii) Let a G (0, 1), and for each k £ Z + , let 

l/k a , for ne {1,2, ..., k}; 



Xk ' n < 0, otherwise. 
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Also, let v n = for all n. Using the expression for the £ p norm, ( j 1.36a) , 

(ap-l)/p 



\v - x k \ 




I) 



so Xi, X2, ■ ■ ■ converges to v when p G (l/o, 00). For p = 00, using the 
expression for the l°° norm, ( ]1.36b| ), \\v — Xk\\oo = 1/fc for all k, so the 
sequence converges to x under the £°° norm as well. 



As reviewed in Appendix l.A.ll a set of real numbers is closed if and only if it 
contains the limits of all its convergent sequences; for example, (0, 1] is not closed in 
M since the sequence of numbers x k = 1/fc, k G Z + , lies in (0, 1] and converges, but 
its limit point is not in (0, 1]. Carrying this over to Hilbert space setting yields 
the following: 



Definition 1.14 (Closed subspace) A subspace 5 of a normed vector space 
V is called closed when it contains all limits of sequences of vectors in S. 



Subspaces of finite-dimensional normed vector spaces are always closed. Exer- 
cise 1.191 gives an example of a subspace of an infinite-dimensional normed vector 
space that is not closed. 

Subspaces often arise as the span of set of vectors. As the following example 
shows, the span of an infinite set of vectors is not necessarily closed. For this reason, 
we frequently work with the closure of a span. 

Example 1.11 (Span may not be closed) Consider the following infinite set 
of vectors from ^ 2 (N): for each k G N, let the sequence Sk be except for a 1 
in the fcth position. Recall from Definition 1.41 that the span is all finite linear 
combinations, even when the set of vectors is infinite. Thus, span({so, Si, . . .}) is 
the subspace of all vectors in ^ 2 (Z + ) that have a finite number of nonzero entries. 
To prove that the span is not closed, we must find a sequence of vectors in the 
span (each having finitely-many nonzero entries) that converges to a vector not 
in the span (having infinitely-many nonzero entries). For example, let v be any 
sequence in i? 2 (N) with infinite support, for example v n = l/(n + 1) for n G N. 
Then for each k G N, define vector Xk G £ 2 (N) by 

J v n , for n = 0, 1, . .. , k; 

Xk ' n = \ 0, otherwise. 

For each k G N, the vector x k is a linear combination of {so, si, • • ■ , Sfc}- While 
the sequence xo, Xi, ... converges to v (under the £ 2 (N) norm), its limit v is not 
in the span. 

Since the closure of a set is the set of all limit points of convergent sequences in 
the set, the closure of the span of an infinite set of vectors is the set of all convergent 
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Figure 1.8: The first few partial sums of X^STLo l/ n 'i each rational, converging to the 
irrational number e. 



infinite linear combinations: 



spa,n({<pk}k&) = < y^a fc (/7 fc 



ak (z C and the sum converges 



The closure of the span of a set of vectors is always a closed subspace. 



1.3.2 Completeness 

It is awkward to do analysis in the set of rational numbers Q instead of in R 
because Q has infinitesimal gaps that can be limit points of sequences in Q. A 
familiar example is that Xk = S n= o V 71 ' i s rational for every nonnegative integer 
k, but the limit of the sequence is the irrational number e (see Figure 1.8j) . If 
we want the limit of any sequence to make sense, we need to work in K, which is 
the completion of Q. Working only in Q, it would be hard to distinguish between 
sequences that converge to an irrational number and ones that do not converge at 
all — neither would have a limit point in the space. 

Complete vector spaces are those in which sequences that intuitively ought to 
converge (the Cauchy sequences) have a limit in the same space. 



Definition 1.15 (Cauchy sequence of vectors) A sequence of vectors 
xq, xi, ... in a normed vector space is called a Cauchy sequence when: Given 
any e > 0, there exists a K £ such that 

ll^fc — x m \\ < e for all k, m > K e . 



The elements of a Cauchy sequence eventually stay arbitrarily close to each other. 
Thus it may be intuitive that a Cauchy sequence must converge; this is in fact true 
for real- valued sequences. However this is not true in all vector spaces, and it gives 
us important terminology: 
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Vector spaces 










>^Tnner product spaces Banach spaces >^ 
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Figure 1.9: Relationships between types of vector spaces. Several examples of 

vector spaces are marked. For Q^ and C N we assume the standard inner product. 
(V, d) represents any vector space with the discrete metric as described in Exercise 1.131 



(C([a,b]\ 



represents C([a,b}) with the C? norm replaced by the £°° norm. £°(Z,) 



is described in Exercise 1.191 



Definition 1.16 (Completeness and Hilbert space) A normed vector 
space V is said to be complete when every Cauchy sequence in V converges to a 
vector in V. A complete inner product space is called a Hilbert space. 



A complete normed vector space is called a Banach space. 

Example 1.12 Ignoring for the moment that Definition 11.11 restricts the set of 
scalars to K or C, consider Q as a normed vector space over the scalars Q, with 
ordinary addition and multiplication and norm ||x|| = |x|. This vector space is 
not complete because there exist rational sequences with irrational limits, such 
as the example of the number e we have just seen (see Figure fL8| ) . 



Standard Spaces 

From Definition 1.161 completeness makes sense only in a normed vector space. 
We now comment on the completeness of the standard spaces we discussed in Sec- 
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tion 1.2.41 (see Figure [L9) . 

(i) All finite-dimensional spaces are completely For example, C as a normed 
vector space over C with ordinary addition and multiplication and with norm 
||x|| = |x| is complete. This can be used to show that C N is complete under 
ordinary addition and multiplication and with any p norm; see Exercise 1.221 
C N under the 2 norm is a Hilbert space. 
(ii) All £ p (7i) spaces are complete; in particular, £ 2 (Z) is a Hilbert space. 

(iii) All £ P (R) spaces are complete; in particular, C 2 (M) is a Hilbert space. An C p 
space can either be understood to be complete because of Lebesgue measura- 
bility and the use of Lebesgue integration, or it can be taken as the completion 
of the space of continuous functions with finite C p norm. 

(iv) C 9 ([a, &]) spaces are not complete, except under the C°° norm. For example, 
C([0, 1]) is not complete as there exist Cauchy sequences of continuous func- 
tions whose limits are discontinuous (and hence not in C([0, 1])); see Solved 
Exercise 11.31 
(v) We consider spaces of random variables only under the inner product ( 11.331 ) 
and norm ( ] 1.341 ). These inner product spaces are complete and hence Hilbert 
spaces. 



Separability Separability is more technical than completeness. A space is called 
separable when it contains a countable dense subset. For example, M. is separable 
since Q is dense in M and is countable. However, these topological properties are 
not of much interest here. 

We are interested in separable Hilbert spaces because a Hilbert space contains 
a countable basis if and only if it is separable (we formally define a basis in Sec- 
tion ECS). The Hilbert spaces that we will use frequently (as marked in Figure [L9] ) 
are all separable. Also, a closed subspace of a separable Hilbert space is separable, 
so it contains a countable basis as well. 

1.3.3 Linear Operators 

Having dispensed with technicalities, we are now ready to develop operational 
Hilbert space machinery. We start with linear operators, which generalize finite- 
dimensional matrix multiplication (see Appendix 1 1 ,B| for more details). 



Definition 1.17 (Linear 


operator) 


A function A : 


Ho- 


>Hi 


is 


called 1 


1 linear 


operator from H 


to Hi when, for all x, 


V in H 


and a 


mC 


(or 


M): 






(i) Additivity: 


A(x + y) 


= Ax + Ay. 
















(ii) Scalability: 


A(ax) = 


a(Ax). 

















10 Recall the restriction of the set of scalars to M or C. Without this restriction, there are 
finite-dimensional vector spaces that are not complete, as in Example |1.121 
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When domain Hq and codomain Hi are the same, A is also called a linear operator 
on Hq. 



Note the convention of writing Ax instead of A(x), just as is done for matrix mul- 
tiplication. In fact, linear operators from C N to C M and matrices in C MxN ar e 
exactly the same thing. 

Many concepts from finite-dimensional linear algebra extend to linear opera- 
tors on Hilbert spaces in rather obvious ways. For example, the null space or kernel 
of a linear operator A : Hq — > Hi is the subspace of Hq that A maps to 0: 

Af(A) = {xe H a \Ax = 0}. (1.39) 

The range of a linear operator A : Hq — > H± is a subspace of H± : 

K(A) = {Ax G Hi I x G H }. (1.40) 

Some generalizations to Hilbert spaces are simplified by limiting attention to bounded 
linear operators. 

Definition 1.18 (Operator norm and bounded linear operator) The 
operator norm of A, denoted by \\A\\, is defined as 

\\A\\ = sup ||Ar||. (1.41) 

11x11=1 

A linear operator is called bounded when its operator norm is finite. 



It is implicit in the definition that ||x|| uses the norm of Hq and \\Ax\\ uses the norm 
of Hi . This concept applies equally well with Hq and Hi replaced by any normed 
vector spaces Vq and V\. 

Example 1.13 (Unbounded operator) Consider A : £ 2 (Z) -» £ 2 (Z) defined 
by 

(Ax) n = \n\x n for all n G Z. 

While this is a linear operator, it is not bounded. To see this by contradiction, 
suppose the operator norm ||^4|| is finite. Then there is an integer M larger than 
|| A ||, and by considering as input the sequence that is except for a 1 in the 
Mth position, we obtain a contradiction. 

Linear operators with finite-dimensional codomains are always bounded. Con- 
versely, by limiting attention to bounded linear operators we are able to extend 
concepts to Hilbert spaces while maintaining most intuitions from finite-dimensional 
linear algebra. For example, bounded linear operators are continuous: 

if Xk — > v then Axk — > Av. 
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Definition 1.19 (Inverse) A bounded linear operator A : H — > Hi is called 
invertible if there exists a bounded linear operator B : Hi — > Hq such that 



B Ax = x, 
ABy = y, 



for every x in Hq, and 
for every y in Hi. 



(1.42a) 
(1.42b) 



When such a B exists, it is unique, is denoted by A~ l , and is called the inverse 
of A] B is called a Ze/i inverse of ^4 when ( | 1.42a) holds and B is called a right 
inverse of A when (1.42b) holds. 



Example 1.14 (Linear operators) 

(i) Ordinary matrix multiplication by the matrix 



A 



3 1 
1 3 



defines a linear operator on R 2 , A : M. 2 — > K 2 . It is bounded, and its opera- 
tor norm (assuming the 2 norm for both the domain and the codomain) is 
4. We show here how to obtain the norm of A by direct computation (we 
could also use the relationship between eigenvalues, singular values, and 
the operator norm, explored in Exercise 11.67) : 



sup \\Ax\\ 

||x||=l 



sup 



I" 3 1_ 

[l 3_ 




cos 8 
s'm8 


= sup 







3 cos 8 + sin 8 
cos 8 + 3 sin 8 



sup y (3 cos 8 + sin 8) + (cos 8 + 3 sin 8) 
e 

sup ^ 10 cos 2 + 10sin 2 6> + 12 sin cos 8 
4. 



= supVl0 + 6sin26» 
9 

The null space of A is only the vector 0, the range of A is all of K 2 , and 
A- 1 



3/8 
-1/8 



-1/8 
3/8 



(ii) Ordinary matrix multiplication by the matrix 



3 
j 



defines a linear operator A : C 3 — > C 2 . It is bounded, and its operator 
norm (assuming the 2 norm for both the domain and the codomain) is 

V3. Its null space is { [xo jxq JXo\ }, its range is all of C 2 , and it is not 
invertible. (There exists B satisfying (1.42b) , but no B can satisfy (1.42a) .) 
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(iii) For some fixed complex-valued sequence (o/c)fc<EZ, consider the component- 
wise multiplication 

(Ax)k = a k x k (1.43a) 

as a linear operator on f 2 (Z). We can write this with infinite vectors and 
matrices as 



Ax 










a_i 










a 














X-l 



Xq 



X\ 



(1.43b) 



It is easy to check that Definition 1 1.17 li) and (ii) are satisfied, but we must 
constrain a to ensure that the result is in £ 2 (Z). For example, ||o||oo = M < 
oo ensures that Ax is in £ 2 (Z) for any x in i? 2 (Z). Furthermore, the operator 
is bounded and ||j4|| = M. The operator is invertible when inffc \ak\ > 0. 
In this case, the inverse is given by 



A~ x y 















l/a_i 








1/ao 








1/ai 












y-i 



//o 



m 



Adjoint Operator Finite-dimensional linear algebra has many uses for transposes 
and conjugate transposes. The conjugate transpose (or Hermitian transpose) is 
generalized by the adjoint of an operator. 



Definition 1.20 (Adjoint and self-adjoint operators) The linear opera- 
tor A* : H\ — y Hq is called the adjoint of the linear operator A : Hq — > H\ 
when 



(Ax,y) Hl = (x,A*y) Ha , for every x in H and y in H i . 
When A = A* , the operator A is called self-adjoint or Hermitian. 



(1.44) 



Note that the adjoint gives a third meaning to * , the first two being the complex 
conjugate of a scalar and the Hermitian transpose of a matrix. These meanings are 
consistent, as we verify in the first two parts of the following example. 

Example 1.15 (Adjoint operators) 
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(i) For any Hilbert space H , consider A : H — > H given by Ax = ax for some 
scalar a. For any x and y in H , 

{Ax, y) = (ax, y) = a(x, y) = (x, a*y), 

where (a) follows from linearity in the first argument of the inner product; 
and (b) from conjugate linearity in the second argument of the inner prod- 
uct. Comparing to ( ] 1.441 ), the adjoint of A is A* y = a* y. Put simply: the 
adjoint of multiplication by a scalar is multiplication by the conjugate of 
the scalar, consistent with using * for conjugation of a scalar, 
(ii) Consider a linear operator A : C N — > C M . The C N and C M inner products 
can both be written as (x, y) = y*x, where * represents the Hermitian 
transpose. Thus for any x G C N and y G C M , 

(Ax,y) C M = y*(Ax) = (y* A)x = {A*y)*x = (x,A*y) C N 1 

where in (a) we use A* to represent the Hermitian transpose of the matrix 
A. Comparing to ( 11.44J ) . it seems we have reached a tautology, but this 
is because the use of A* as the adjoint of linear operator A and the Her- 
mitian transpose of matrix A are consistent. Put simply: the adjoint of 
multiplication by a matrix is multiplication by the Hermitian transpose of 
the matrix, consistent with using * for Hermitian transpose of a matrix. 
(iii) Consider the linear operator defined in ( |1.43| ). For any x and y in £ 2 (Z), 

(Ax,y) P = 22(a n x n )y^ = 2ja n (a*2/„)* 

where (a) follows from Q1.43J ) and the definition of the £ 2 (Z) inner product, 
( ]1.20b[ ); and (b) from commutativity and associativity of scalar multipli- 
cation along with (a*)* = a n . Our goal in expanding (Ax, y) above is 
to see the result as the inner product between x and some linear operator 
applied to y. Comparing the final expression to the definition of the £ 2 (Z) 
inner product, the componentwise multiplication (A*y) n = a* n y n defines 
the adjoint. 

The above examples are amongst the simplest, and they do not necessarily clearly 
reveal the role of an adjoint. In Hilbert spaces, the relationships between vectors 
are measured by inner products. The defining relation of the adjoint ( 1 1 .441 ) shows 
that the action of A on Hq is mimicked by the action of the adjoint A* on Hi; this 
mimicry is only visible through the applicable inner products, so the adjoint itself 
depends on these inner products. Loosely, if A has some effect, A* preserves the 
geometry of that effect while acting with reversed domain and codomain. 

In general, finding an adjoint requires some ingenuity, as we now show: 

Example 1.16 (Local averaging and its adjoint) The operator 

pn+1/2 
(Ax) n = I x(t)dt (1.45) 

./ ri-l/2 
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10 
5 

-5 
-10 


x(t) 








/ 5 


10 


15 b 







(a) 



(Ax) r 



-5 
-10 



I ■ , I 



(b) 



10 


Vn 
















5 


± 


1,. 


5 lb 1 






IS 


-10 







{A'y){t) 



(c) 




Figure 1.10: Illustration of the adjoint of an operator, (a) We start with a function 
x in £ 2 (M). (b) The local averaging operator A in ( 11.451) gives a sequence in £ 2 {Z). (c) 
y is an arbitrary sequence in £ 2 {Z). (d) The adjoint A* is a linear operator from £ 2 {Z) 
to £ 2 {R) that uniquely preserves geometry in that {Ax, y) 2 = {x, A*y) 2 c . The adjoint of 
local averaging is to form a piecewise-constant function as in (11.481) . 



takes local averages of the function x(t) to yield a sequence (Ax) n . (This oper- 
ation is depicted in Figure 1.101 and is a form of sampling, covered in detail in 
Chapter [4}) We will first verify that A is a linear operator from £ 2 {M) to f 2 (Z) 
and then find its adjoint. 



The operator A clearly satisfies Definition 1.1 7 Ti) and (ii) ; we just need to 
be sure that the result is in £ 2 (Z). Given x G C 2 (R), let us compute the £ 2 norm 
of Ax: 



\\Ax\\< 



(j0 v^ 



(c) 

< 



raGZ 

E 

nSZ 



\{Ax), 



,2 W V- 



nez 



-1/2 



-1/2 



x(t) dt 



-1/2 



71-1/2 



\x{t)\ 2 dt 



\x(t)\ 2 dt = \\x\\ 2 c2 , 



where (a) follows from the definition of the £ 2 norm ( jl.281 ) ; (b) from ( 11.451 ) ; and 
(c) from ( |1.32| ). Thus, Ax is indeed in £ 2 (Z) since its norm is bounded by ||x||£2, 
which we know is finite since x G C 2 (M) . 

We now derive the adjoint of the operator ( 11.451 ). To do this, we must 
find an operator A* : £ 2 {TL) — » £ 2 {M.) such that {Ax, y)p = {x, A*y) C 2 for any 
x G £ 2 (M.) and y G £ 2 (Z). After expanding both expressions using the definitions 
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of the two inner products, the unique choice for A*y will be clear: 
(Ax,y) P = ^2(Ax) n y* n = XM / x ( t ) dt \v*n 

nGZ nGZ \ J «-l/2 ) 

= £/ »(%i* ( L46 ) 

where (a) follows from the definition of an inner product in £ 2 (Z), ( ] 1.281 ); (b) 
from ( ]1.45j ); and in (c) we pull y n into the integral since it does not depend on 
t. For this final expression to match 



(x, A*y) C i = / x(t)((A*y)(t))* dt (1.47) 

J —00 

for arbitrary x and y, we must define A*y as the piecewise-constant function 

(A*y)(t) = y n for te[n-\,n+\). (1.48) 

Then the integral in ( jl.47) breaks into the sum of integrals in ( 11.461 ) . 
The following theorem summarizes several key properties of the adjoint: 

Theorem 1.21 (Adjoint properties) Let A : H — > Hi be a bounded linear 
operator. 

(i) The adjoint A*, defined through (1.44) , exists, 

(ii) The adjoint A* is unique. 

(iii) The adjoint of A* equals the original operator, (A*)* = A. 

(iv) The operators AA* and A* A are self-adjoint. 

(v) The operator norms of A and A* are equal, ||j4*|| = \\A\\. 

(vi) If A is invertible, the adjoint of the inverse and the inverse of the adjoint are 

equal, (A" 1 )* = {A*)-\ 

(vii) Let B : Hq — ► H\ be a bounded linear operator. Then (A + B)* = A* + B* . 

(viii) Let B : Hi — > H2 be a bounded linear operator. Then (IL4)* = A*B* . 

Proof. Parts <Tij\ and |(v)| are the most technically challenging and go beyond our scope; 
proofs based on the Riesz representation theorem can be found in texts such as [93]. 
Parts (ii) and (iii) are proven below, with the remaining parts left for Exercise 1.261 

(ii) Suppose B and C are adjoints of A. Then for any x in Hq and y in Hi, 

( => (x, By) - (x, Cy) ( =' (x, By - Cy) ( =' {x,(B-C)y), 

where (a) follows from ( | 1.441) ; (b) from distributivity of the inner product; and (c) 
from additivity of the operators. Since this holds for every x in Ho, it in particular 
holds for x — (B — C)y. By the positive defmiteness of the inner product, we 
must have {B — C)y = for every y in Hi. This implies By = Cy for every y in 
Hi, so the adjoint is unique. 
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(in) For any x in Hi and y in Hg, 

{A*x, y) = {y, A*x)* - {Ay, x)* = (x, Ay), 

where (a) follows from Hermitian symmetry of the inner product; (b) from ( ]1.44[) ; 
and (c) from Hermitian symmetry of the inner product. 

The adjoint of a bounded linear operator provides key relationships between sub- 
spaces (Figure 1.36 in Appendix l.B illustrates the case when the operator is a 
finite-dimensional matrix) : 

TZ(A) 1 - = Af{A*), (1.49a) 

TZ{A) = AfiA*)^. (1.49b) 

To see that N{A*) C Tl(A)^ , first let y € Af(A*) and y' E K{A). Then since 
y' = Ax for some x, we can compute 

(y', y) = (Ax, y) = {x, A*y) = (x, 0) = 0, 

which shows y 1 K(A). Conversely, to see that 11(A) 1 - C M(A*), let y £ TZ(A) 1 - 
and let x be any vector in the domain of A. Then since 

= (Ax, y) = (x, A*y) 

and choosing x = A*y implies A*y = by positive definiteness of the inner product, 
we must have y G M(A* ) . These arguments prove ( |1.49aj ) . The subtleties of infinite 
dimensions, for example that the range of a bounded linear operator may not be 
closed, make proving ( ]1.49b[ ) a bit more difficult. Linear operators that arise in 
later chapters have closed ranges. 

Unitary Operators Unitary operators are important because they preserve geom- 
etry (lengths and angles) when mapping one Hilbert space to another. 



Definition 1.22 (Unitary operators 
H\ is called unitary when 


i) A bounded linear operator 


A 


■H, 


1 — > 


(i) 


it is invertible; and 
















(ii) 


it preserves inner products, 


















(Ax, Ay) Hl = 


(x, 


V)- 


H for every x, 


y in H . 




(1- 


50) 



Preservation of inner products leads to preservation of norms: 

\\Ax\\ 2 = (Ax, Ax) = (x, x) = \\x\\ 2 . (1.51) 

In Fourier theory, this is called Parseval's equality; it is used extensively in the 
book. 
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The following theorem provides conditions equivalent to the definition of a 
unitary operator. These conditions are reminiscent of the standard definition of a 
unitary matrix. 

Theorem 1.23 (Unitary linear operator) A bounded linear operator A : 
Hq — > Hi is unitary if and only if A -1 = A* . 

Proof. Condition (1.50) is equivalent to A* being a left inverse of A: 

A* A = I on Ho. (1.52a) 

To see that (11.501) implies Q1.52aj) , note that 

{A* Ax, y) ( => {Ax, Ay) '=' (x,y), 

where (a) follows from the definition of adjoint; and (b) from (11.501) . Conversely, to see 
that (11.52a) implies (17501 , note that 

{Ax, Ay) ( => (x, A" Ay) (§ (x,y), 

where (a) follows from the definition of adjoint; and (b) from (1.52a) . 

Combining ( 11.50) with invertibility gives that A* is a right inverse of A: 

AA* = I on Hi. (1.52b) 

To verify (1.52b) , note that for every x, y in Hi, 

(AA*x,y) = (AA*x,AA~ l y) C = 5 (A*x, A~ x y) ( => (x, AA~ l y) = (x,y), 

where (a) follows from (1.50) ; and (b) from the definition of adjoint. 

The desired equivalence follows: If A is unitary, then A is invertible and (1.50) 
holds, so both conditions (1.52) hold, so A -1 = A* . Conversely, if A -1 = A* , then A 
is invertible and (1.52a) holds, implying (1.501) . 

Eigenvalues and Eigenvectors The concept of an eigenvector generalizes from 
finite-dimensional linear algebra to our Hilbert space setting. Like other concepts 
that apply to matrices only when they are square, the generalization applies when 
the domain and codomain of a linear operator are the same Hilbert space. We call 
an eigenvector an eigensequence when the signal domain is Z or a subset of Z (for 
example, in Chapter |2) ; we call it an eigenfunction when the signal domain is R or 
an interval [a,b] (for example, in Chapter [3). 



Definition 


1.24 (Eigenvector 


OF A 


LINEAR 


operator) 


An 


eigenvector of a 


linear operat 


or A : H — * H is a nonzero 


vector v 


G H such that 








Av - 


= Aw, 






(1.53) 


for some A G 


C. The constant A is 


called the corresponding eigenvalue and (A, v) 


is called an eigenpair. 
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The eigenvalues and eigenvectors of a self-adjoint operator A have several 
useful properties: 

(i) All eigenvalues are real: If (A, v) is an eigenpair, 

X(v, v) = (Xv, v) = (Av, v) = (v, Av) = (v, Xv) = X*(v, v), 

so A is real. 
(ii) Eigenvectors corresponding to distinct eigenvalues are orthogonal: If (Ao,^o) 
and (X\,V\) are eigenpairs with Ao 7^ X\, 

Ao(w , vi) = (X v , vi) = (Av , vi) = (vq, Avi) = (v , AiVi) = A*(v , i>i), 

so (v , v^ = 0. 

Positive Definite Operators Positive definiteness can also be generalized from 
square Hermitian matrices to self-adjoint operators on a general Hilbert space. 



Definition : 


L.25 (Definite linear operator) A 


self-adjoint 


operator 


A : H -> H is 


called 










(i) positive 


semidefinite 


or nonnegative definite, written A > 0, 


when 








(Ax, x) > for all x G 


H; 




(1.54a) 


(ii) positive 


definite, written A > 0, when 










(Ax 


, x) > for all nonzero 


x G H ; 




(1.54b) 


(iii) negative 


semidefinite 


or nonpositive definite when 


—A is positive 


semidefi- 


nite; and 










(iv) negative 


definite when —A is positive definite. 









As suggested by the notation, positive definiteness defines a partial order on self- 
adjoint operators. When A and B are self-adjoint operators defined on the same 
Hilbert space, A > B means A — B > 0, that is, A — B is a positive semidefinite 
operator. 

As noted above, all eigenvalues of a self-adjoint operator are real. Positive 
definiteness is equivalent to all eigenvalues being positive; positive semidefiniteness 
is equivalent to all eigenvalues being nonnegative. Exercise 11.271 develops a proof of 
these facts. 

1.4 Approximations, Projections, and 
Decompositions 

Many of the linear operators that we encounter in later chapters are projection 
operators, in particular orthogonal projection operators. As we will see in this 
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section, orthogonal projection operators find best approximations from within a 
subspace, that is, approximations that minimize a Hilbert space norm of the error. 
An orthogonal projection generates a decomposition of a vector into components in 
two orthogonal subspaces. We will also see how the more general oblique projection 
operators generate decompositions of vectors into components in two subspaces that 
are not necessarily orthogonal. 

Best Approximation, Orthogonal Projection, and Orthogonal Decomposition 

Let S be a closed subspace of a Hilbert space H and let a; be a vector in H. The 
best approximation problem is to find the vector in S that is closest to x: 

x = argmin||x — s\\. (1.55) 

ses 

Most commonly, a Hilbert space norm involves a sum of squares (as in ^ 2 (Z)) or 
integral of a square (as in £ 2 (R)), in which case x is called a least-squares approx- 
imation. Of course, when x is in S then x = x uniquely solves the problem — it 
makes the approximation error \\x — x\\ zerocj The interesting case is when x is 
not in S . 

Figure 1.11 illustrates the problem and its solution in ordinary Euclidean 
geometry. The point on the line S that is closest to a point x not on the line is 
uniquely determined by finding the circle centered at x that is tangent to the line. 
Any other candidate x' on the line lies outside the circle and is thus farther from 
x. Since a line tangent to a circle is always perpendicular to a line segment from 
the tangent point to the center of the circle, the solution x satisfies the following 
orthogonality property: x — x _L S. The projection theorem extends this geometric 
result to arbitrary Hilbert spaces. The approximation x and the residual x — x form 
an orthogonal decomposition of x that is uniquely determined by the subspace S, 
and orthogonality of x — x and S uniquely determines x. 

1.4.1 Projection Theorem 

The solution to the best approximation problem in a general Hilbert space is de- 
scribed by the following theorem: 



Theorem 1.26 


(Projection theorem) 


Let S be a closed subspace of Hilbert 


space H and let 


x be a vector in H. 




(i) Existence: 


There exists x £ S such that \\x — x\\ < \\x — s\\ for all s6S. 


(ii) Orthogonality: x — x _L S is necessary 


and sufficient for determining x. 


(iii) Uniqueness: The vector x is unique. 




(iv) Linearity: 


x = Px where P is a linear 


• operator that depends on S and not 


on x. 







L Recall from Definition 1 1 . Sf[i)j that \\x — x\\ is nonnegative and zero if and only if x — x = 0. 
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Figure 1.11: Illustration of best approximation. In Euclidean geometry, the best 
approximation of x on the line S is obtained with error x — x orthogonal to S; any 
candidate x such that x — x is not orthogonal to S is farther from x. This holds more 
generally in Hilbert spaces. 



(v) Idempotency: P (Px) = Px for all x 6 H . 
(vi) Self-adjointness: P = P* . 



Proof. We prove existence last since it is the most technical and is the only part that 
requires completeness of the space. (Orthogonality and uniqueness hold with H replaced 
by any inner product space and S replaced by any subspace.) 

(ii) Orthogonality: Suppose that x minimizes \\x — x\\ but that x — xJL S. Then there 
exists a unit vector tp £ S such that (x — x, (p) — e ^ 0. Let s — x + sip and note 
that s is in S since S is a subspace. The calculation 




then shows that x is not the minimizing vector. This contradiction implies x — xl. 
S. This can also be written asi-16 S ± ; see Figure 1.121 
(hi) Uniqueness: Suppose x — x _l_ S. For any s 6 S, 

II II 2 11/ ~\ 1 C~ Ml 2 ( a ) II ~ll 2 1 ll~ II 2 

\\ x ~ s ll = IK X — x ) + \ x ~ s )\\ — \\ x ~ x \\ + \\ x ~ s \\ i 

where x — s e S implies x — x A- x — seS, which allows an application of the 
Pythagorean theorem in (a). This shows ||a; — s\\ > \\x — x\\ for any s/i. 
(iv) Linearity: Let a be any scalar, and denote the best approximations in S of x\ 
and £2 by x\ and X2- The orthogonality property implies x\ — x\ £ S and 
X2 — X2 G S . Since S is a subspace, x\ + X2 £ S; and since S is a subspace, 
(xi — x\) + (x 2 — x 2 ) £ S ■ With (xi + X2) — (xi + x 2 ) G S 1 " and xi + x 2 £ S, 
the uniqueness property shows that x\ + x 2 is the best approximation of x\ + X2- 
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Figure 1.12: The best approximation of x £ H within closed subspace S is 
uniquely determined by x — x _l_ S. The solution generates an orthogonal decompo- 
sition of x into x £ S and 1-16S . 



This shows additivity. Similarly, since S is a subspace, ax\ £ S; and since S is 
a subspace, a(xi — xi) £ S . With ax\ — ctx\ £ S and ax\ £ S, the uniqueness 
property shows that axi is the best approximation of ax\. This shows scalability. 
(v) Idempotency: The property to check is that the operator P leaves Px unchanged. 
This follows from two facts: Px £ S and Pu — u for all u £ S. That Px is in S is 
part of the definition of x. For the second fact, let U 6 S and suppose u satisfies 

\\ u ~ 2|| < \\u — s\\ for all s £ S. 

By the uniqueness property, there can be only one such u, and since u = u gives 
\\u — w|| = and the norm is nonnegative, we must have u — u. 
(vi) Self-adjointness: We would like to show (Px, y) — (x, Py) for all x,y £ H: 

(Px,y) = {Px,Py + (y-Py)) ( =' {Px, Py) + {Px, y - Py) ^ {Px,Py), 

where (a) uses the distributivity of the inner product; and (b) follows from Px £ S 
and y — Py £ S ; similarly 

(x, Py) - {Px+(x- Px), Py) = {Px, Py) + {x - Px, Py) - {Px,Py). 

(i) Existence: We finally show existence of a minimizing x. If x is in S, then x — x 
achieves the minimum so there is no question of existence. We thus restrict our 
attention to x S. Let e — inf se s ||rr — s\\. Then there exists a sequence of 
vectors so, Si, ... in S such that \\x — Sk\\ — > e; the challenge is to show that the 
innmum is achieved by some x 6 S. We do this by showing that {sfc}fe>o is a 
Cauchy sequence and thus converges, within the closed subspace S, to the desired 
.r. 



By applying the parallelogram law ( |1.25D to x — Sj and Si — x, 
\\(x - S] ) + {s, - x)f + \\(x - S] ) - {s t - x)f = 2||x- Sj || 2 + 2|| Sl -x|| 2 . 
Cancelling x in the first term and moving the second term to the right yields 

Hsi-Sj-f = 2||x-sj|| 2 +2\\si -x\\ 2 -4||x- |(si + sj)|| 2 . (1.56) 
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Now since 5 is a subspace, ~(sj + Sj) is in S. Thus by the definition of e we have 
||as — 2 ( s » + s j)ll ^ s - Substituting in ( 11.561) and using the nonnegativity of the 
norm, 

0-^ 1 1 11^ --^ fill 1 1 ^ 1 n 1 1 11^ a ^> 

< || Si — Sj || < 2||x — Sj\\ +2\\si — x\\ — 4e . 

With the convergence of \\x — Sj\\ and \\st — x\\ to e , we conclude that {sk}k>o 
is a Cauchy sequence. Now since S is a closed subspace of a complete space, 
{sfe}fe>o converges to x 6 S. Since a norm is continuous, convergence of {sk}k>o 
to x implies ||as — a?|| = e. 

The projection theorem leads to a simple and unified methodology for computing 
best approximations through the normal equations. We develop this in detail after 
the introduction of bases in Section [1.51 The following example provides a preview. 

Example 1.17 Consider the function x(t) = cos(|7rf) in the Hilbert space 
£ 2 ([0, 1]). To find the degree-1 polynomial closest to x directly (without using 
the projection theorem) would require solving 



min / I cos(§7ri) — (arj + a\t)\ dt. 

a ,a 1 J Q - 



Noting that the degree-1 polynomials form a closed subspace in this Hilbert space, 
the projection theorem shows that (arj,ai) is determined uniquely by requiring 

x(t) — x(t) = cos(|7rf) — (00 + ait) 

to be orthogonal to the entire subspace of degree-1 polynomials. Imposing or- 
thogonality to 1 and t gives two linearly-independent equations to solve: 



2 1 



= (x(t) - x(t), 1) = / (cos(§7rt) - (ao + ait)) ■ 1 dt = a - -ai , 



2 




I / ,0 . , .v 4 + 6i 1 1 

(x(t) - x(t), t) = (cos(|7ri) - (a + ait)) -tdt = ——^ -a - -a x . 



3tt 
f 6? 
9tt2" 2"" 3^ 



Their solution is 



4tt 16+12tt 

01 



3tt 2 ' * 3tt 2 

Figure 11.13( a) shows the function and its degree-1 polynomial approximation. 

Best approximations among degree-fc polynomials for k = 1, 2, 3, 4, 5 are 
shown in Figure [1.13( b). Increasing the degree increases the size of the subspace 
of polynomials, so the quality of approximation is naturally improved. We will 
see in Chapter [5] (Theorem 15.3) that since x(t) is continuous, the error is driven 
to zero by letting k grow without bound. 

The effect of the operator P that arises in the projection theorem is to move 
the input vector x in a direction orthogonal to the subspace S until S is reached at 
x. In the following two sections, we show that P has the defining characteristics of 
what we will call an orthogonal projection operator, and we describe the mapping 
of x to an approximation x and residual x — x as what we will call an orthogonal 
decomposition. Projections and decompositions each have nonorthogonal (oblique) 
versions. 
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x{t), x(t) 

1 -~ 



x(t),x(t) 





-If- •>--.- \^ -1- 

(a) Degree-1 best approximation. (b) Degree-1, 2, 3, 4 best approximation. 



Figure 1.13: (a) The best approximation of x(t) (dashed) among degree-1 polynomials 
is x(t) (solid), where approximation quality is measured by the C norm on ([0, 1]); see 
Example 1.171 The best approximation is determined by the orthogonality of x — x to 
the subspace of degree-1 polynomials, (b) Allowing higher-degree polynomials, for k = 1 
(black), k = 2 (dark gray), k — 3 (light gray), k = 4 (dark red), increases the size of the 
subspace to which x(t) is orthogonally projected and decreases the approximation error. 



1.4.2 Projection Operators 

The operator P that arises from solving the best approximation problem is an 
orthogonal projection operator, as per the following definition! \ 



Definition 1.27 (Projection, orthogonal projection, oblique projection) 

(i) An idempotent operator P is an operator such that P 2 = P. 

(ii) A projection operator is a bounded linear operator that is idempotent. 

(iii) An orthogonal projection operator is a projection operator that is self- 
adjoint. 

(iv) An oblique projection operator is a projection operator that is not self- 
adjoint. 



An operator is idempotent when applying it twice is no different than applying 
it once. Setting certain components of a vector to a constant value is an idempotent 
operation, and when that constant is zero this operation is linear. The following 
example introduces a notation for the basic class of orthogonal projection operators 
that set a portion of a vector to zero. 



12 Two notes of caution are warranted. First, we use the term projection only with respect to 
Hilbert space geometry; many other mathematical and scientific meanings are inconsistent with 
this. Second, some authors use "projection" to mean "orthogonal projection." We will not adopt 
this potentially-confusing shorthand because many important properties and uses of projection 
operators hold for all projections — both orthogonal and oblique. 
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Example 1.18 (Projection via domain restriction) Let I be a subset of 
Z, and define the linear operator lj : £ 2 (Z) — > £ 2 (Z) by 

y = l lX where y k = j * fc ' ^^ (1-57) 



This is a special case of the linear operator in Example |1.14 Tiii) , with 



1, for k G I: 
0, otherwise. 

This operator is obviously idempotent, and it is self-adjoint because of the adjoint 
computation in Example 1 1 . 1 EpfiiiJ , Thus lj is an orthogonal projection operator. 
The same notation is used for vector spaces with domains other than Z. For 
example, with X a subset of R, we define the linear operator lj : C 2 (M.) — » £ 2 (R) 
by 

y = l lX where y(t) = { jf >' Jj^£ (1.58) 

Exercise 11.301 establishes properties of this operator. 

The following theorem uses orthogonality of certain vectors to prove that an 
operator is an orthogonal projection operator. This complements the projection 
theorem, since here an operator is specified rather than a subspace. We discuss the 
subspace that is implicit in the theorem after the proof. 



Theorem 


1.28 


A bounded linear operator 


P on a Hilbert space 


H satisfies 








(x - Px, Py) = 


for all x,y G H 


(1 


59) 


if and only 


if P 


is an orthogonal projection 


operator. 







Proof. We first prove that idempotency and self-adjointness of P imply that ( 11. 59ft 
holds. With x and y arbitrary vectors in H , 

(x-Px,Py) = {x, Py) - {Px, Py) ( =' {x, Py) - {x, P* Py) 

C = 5 (x, Py) - (x, P 2 y) ( => (a,, Py) - (x, Py) = 0, 

where (a) follows from the definition of the adjoint operator; (b) from self-adjointness 
of P; and (c) from idempotency of P. 

Now suppose that ( ]1.59D holds. With z G H arbitrary, since (1.59] ) holds for 
x = Pz and y — z — Pz, we get 

= (Pz- P{Pz), P(z-Pz)) = \\(P-P 2 )z\\ 2 , 

which implies Pz — P 2 z. Since z is arbitrary, we have P — P 2 (idempotency of P). 
By Hermitian symmetry of the inner product, (1.59) implies 

{Px, y- Py) = for all x,y G H. (1.60) 
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Thus, for any x,y G H, 

(P*x,y) fe 5 (x, Py) ^ (Px,Py) C => (Px,y), 

where (a) uses the definition of adjoint; (b) uses ( ]1.59D ; and (c) uses ( 11.60 j ). This implies 
(P*x — Px, y) — 0. By choosing y — P*x — Px, we have that \\P*x — Px\\ — for all 
x, and thus P* — P (self-adjointness of P). 

The range of any linear operator is a subspace. In the setting of the preceding 
theorem, we may associate with P the closed subspace S = 1Z(P). Then we have 
that P is the orthogonal projection operator onto S. The orthogonality equation 
(1.59) is a restatement of the projection residual x — Px being orthogonal to S. 

The following example gives an explicit expression for projection operators 
onto 1-dimensional subspaces. This was discussed informally in Section [1.11 

Example 1.19 (Orthogonal projection onto 1-dimensional subspace) 
Given a vector ip € H of unit norm, let 

Px = (x, <p)<p. (1.61) 

This is a linear operator because of the distributivity and linearity in the first 
argument of the inner product. To use Theorem 1.281 to show that P is the 
orthogonal projection operator onto the subspace of scalar multiples of <p, we 
verify the idempotency and self-adjointness of P. Idempotency is proven by the 
following computation: 

P 2 x = ((x, p)p, p)p = (x, tp)(cp, tp)cp = (x, tp)<p = Px, 

where (a) uses the linearity in the first argument of the inner product; and (b) 
follows from (99, tp) = 1. Self-adjointness is proven by the following computation: 

(Px, y) = ((x, ipjip, y) = (x, tp)(cp, y) = (cp, y)(x, p) 
= (y,<fi)*(x,p) = (x, (y, ip)(p) = (x, Py), 

where (a) follows from linearity in the first argument of the inner product; (b) 
from conjugate symmetry of the inner product; and (c) from conjugate linearity 
in the second argument of the inner product. 

Collections of 1-dimensional projections are central to representations using bases, 
which are introduced in Section [1.51 and developed in several subsequent chapters. 
Solved Exercise 1.41 extends the 1-dimensional example to orthogonal projection 
onto subspaces of higher dimensions. 

The final theorem of the section establishes important connections between 
inverses, adjoints and projections; its simple proof is left for Exercise 1.311 
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Figure 1.14: Two-dimensional range of the oblique projection operator P from Exam- 
ple [17201 It is the plane xo + X2 — X\. For example, vector x — \6 6 8] T is projected via P 
onto Px =[264], not an orthogonal projection. 



Theorem 1.29 Let A : H — > Hi and B : H\ — > H be bounded linear operators. 
If A is a left inverse of B, then BA is a projection operator. Furthermore, if 
B = A* , then BA = A* A is an orthogonal projection operator. 



Example 1.20 (Projection onto a subspace) Let 



and 



B 



"1 


0" 


1 


1 





1 



Since A is a left inverse of B, we know from Theorem 11.291 that P 
projection operator. Explicitly, 



BA is a 



BA 



1 


1 


-1 





2 





1 


1 


1 



from which one can verify P 2 = P. A description of the 2-dimensional range of 
this projection operator is most transparent from B: it is the set of 3-tuples with 
middle component equal to the sum of the first and last (see Figure [1.14J ). Note 
that P ^ P* , so the projection is oblique. 

The final example of the section draws together some earlier results to give an 
orthogonal projection operator on £ 2 (R). It illustrates the basics of sampling and 
interpolation, to which we will return in Chapter 31 

Example 1.21 (Orthogonal projection operator on £ 2 (M)) Let A:£ 2 (I 
£ 2 (Z) be the local averaging operator (L45l ) and let A* : £ 2 (Z) -> £ 2 (R) be its 
adjoint, as derived in Example 1 1.1 61 If we verify that A is a left inverse of ^4*, we 
will have as a consequence of Theorem 11.291 that A* A is an orthogonal projection 
operator. 
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Figure 1.15: Illustration of an orthogonal projection operator on £ 2 (R). The linear 
operator A and its adjoint A* illustrated in Figure 1.10 satisfy AA* — I, so A* A is an 
orthogonal projection operator. The range of A* is the subspace of £ 2 (R) consisting of 
functions that are constant on all intervals [n — 1/2, n + 1/2), n £ Z. Thus, x — A* Ax 
(piecewise constant, red) is the best approximation of x (smooth, blue) in this subspace. 



To check that A is a left inverse of A* , consider the application of AA* 
to an arbitrary sequence in £ 2 (Z). (Recall that the separate effects of A and 
A* are illustrated in Figure [1.101 ) Remembering to compose from right to left, 
AA* starts with a sequence x n , creates a function equal to x n on each interval 
[n — i-,n + h), and then recovers the original sequence by finding the average 
value of the function on each interval [n — h,n+h). So A A* is indeed an identity 
operator. One conclusion to draw by combining the projection theorem with A* A 
being an orthogonal projection operator is the following: Given a function x(t) G 
£ 2 (R), the function in the subspace of piecewise constant functions A*£ 2 (Z) that 

••2 . 



■i,n+i) 



is closest to x(t) in £ norm is the one obtained by replacing x(t), t G [n- 

by its local average J _.,„ x(t)dt. The result of applying A* A is depicted in 

Figure 1.151 



1.4.3 Direct Sums and Subspace Decompositions 

In the projection theorem, the best approximation x is uniquely determined by the 
orthogonality of x and x — x. Thus, the projection theorem proves that x can be 
written uniquely as 



x s + x s ± 



where x$ G S and x s ± G S 



(1.62) 



because we uniquely choose xs = x and ic 



x. Being able to uniquely write 



the sum ( 11.62) for any x defines a decomposition; while having an orthogonal pair 
of subspaces is an important case, it is not necessary. 



Definition 1.30 (Direct sum and decomposition) A vector space V is a di- 



rect sum of subspaces S and T, denoted V 
x G V can be written uniquely as 



S © T, when any nonzero vector 



xs + xt where x$ G S and xt G T. 



(1.63) 
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(a) Decomposition. 



(b) Orthogonal projection 



Oblique projection. 



Figure 1.16: (a) A vector space V is decomposed as a direct sum S © T when any 
x £ V can be written uniquely as a sum of components in S and T. (b) An orthogonal 
projection operator generates an orthogonal direct sum decomposition of a Hilbert space. 
It decomposes vector x into xs & S and x s ± £ S . (c) An oblique projection operator 
generates a nonorthogonal direct sum decomposition of a Hilbert space. It decomposes 
vector x into xs 6 S and i„i G5 . 



The subspaces S and T form a decomposition of V, and the vectors xs and xt 
form a decomposition of x. When S and T are orthogonal, this is called an 
orthogonal decomposition. 



A general direct sum decomposition V = S © T is illustrated in Figure 1.16( a). 

When S is a closed subspace of a Hilbert space H, the projection theorem 
generates the unique decomposition ( 11.62) ; thus, H = S © S 1 - . It is tempting to 
write V = S © S- 1 for any (not necessarily closed) subspace of any (not necessarily 
complete) vector space. However, this is not always possible. The following example 
highlights the necessity of working with closed subspaces of a complete space. As 
noted before Example 1.111 we frequently work the closure of a span to avoid the 
pitfalls of subspaces that are not closed. 

Example 1.22 (Failure of direct sum 5© S- 1 ) As in Example 1.111 con- 
sider Hilbert space £ 2 (N) and for each k G N, let the sequence Sfe be except 
for a 1 in the kth position. Then S = span({so, Si, . . .}) consists of all vectors 
in ^ 2 (N) that have a finite number of nonzero entries, and S is a subspace. Let 
x € £ 2 (N) be a nonzero vector. Then i n /0 for some n € N, so x JL s n , which 
implies x $ S- 1 . Since no nonzero vector is in S , we have that S 1 - = {0}. Since 
S itself is not all of i? 2 (N), one cannot write every x G £ 2 (N) as in (1.62) . 

The main aim of this section is to extend the connection between decom- 
positions and projections to the general (oblique) case. The following theorem 
establishes that a projection operator P generates a direct sum decomposition as 
illustrated in Figure 1.16( c). The dashed line shows the effect of the operator, 
xs = Px, and the residual x^ ± = x — xs is in a subspace we denote S x rather than 
T for reasons that will become clear. 
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Theorem 1.31 Let H be a Hilbert space. 

(i) Let P be a projection operator on H, and let S = TZ(P) and T = J\f(P). 

Then H = 5 © T. 
(ii) Conversely, let closed subspaces S and T satisfy H = S © T. Then there 

exists a projection operator on H such that S = Tt(P) and T = J\f(P). 



Proof. (i) Let x 6 H . We would like to prove that a decomposition of the form ( |1.63D 
exists and is unique. Existence is verified by letting xs — Px, which obviously is 
in S — 1Z(P); and xt ~ x — Px, which is in T — A/"(P) because 

Px T = P(x - Px) = Px- P 2 x = Px- Px = 0, 
where (a) uses that P is idempotent. For uniqueness, suppose 

x — x s + x T where x s £ S and x T 6 T. 

Equating the two expansions of x and applying P, we have 

= P ((XS - x'g) + (X T - X' T )) = P( XS - Xs) + P(XT - X T ) 

= P{x s -x s ) = x s -x s , 

where (a) follows from xt — x' T lying in T, the null space of P; and (b) follows 
from xs — x' s lying in S and P equaling the identity on S. From this, x'g — xs 
and x' T — xt follow, 
(ii) Define the desired projection operator P from the unique decomposition of any 
x S H of the form ( j 1.631) through Px — xs- The linearity of P follows easily 
from the assumed uniqueness of decompositions of vectors. By construction, the 
range of P is contained in S. It is actually all of S because any x G S can be 
uniquely decomposed as x + with x £ S and 6 T. Similarly, the null space 
of P contains T because any x £ T can be uniquely decomposed as + x with 
OeS and x £ T, showing that Px = 0. The null space of P is not larger than T 
because any vector x £ H \T can be written uniquely as in ( |1.63[) with xs 7^ 0, 
so Px / 0. It remains only to verify that P is idempotent. This follows from 
Px £ S and that P equals the identity on S. 

The following example makes explicit the form of a (possibly oblique) projec- 
tion when S is a 1-dimensional subspace. For consistency with later developments, 
the illustration of Theorem 1.31 in Figure 1.16( c) uses S 1 - in place of T . Since S 
and T = S have complementary dimension (adding to the whole Hilbert space), 
we have that S and S are of the same dimension. When S = S, the projection and 
decomposition are orthogonal, and Figure 1.16( c) reduces to Figure 1.121 In the 
example, this corresponds to if = cp. 

Example 1.23 (Oblique projection onto 1-dimensional subspace) Let 5 
be the multiples of vector 93 £ H of unit norm, and let S be the multiples of an 
arbitrary vector tp £ H . The operator 

Px = (x, fi}<p (1.64) 
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is linear and has range contained in S. We will find conditions under which it 
generates a decomposition H = S © S^. 
Since 

P 2 x = ((x, (p)ip, ip)ip = (x, <p)((p, (p)ip, 



where (a) uses the linearity in the first argument of the inner product, P is 
idempotent if and only if (ip, Ip) = 1. Under this condition, we have H = 
K(P) © J\f(P) by Theorem [L3H This can also be written as H = S © S- 1 
because Af(P) and S 1 - are both precisely the set {x e H \ (x, <p) = 0}. 

If (ip, tp) / 1 but also (if, if) 7^ 0, a simple adjustment of the length of ip 
will make P a projection operator. However, if (ip, (p) = 0, it is not possible to 
decompose H as desired. In fact, S and S are orthogonal, so 5 and S 1 - are the 
same subspace. 

1.4.4 Minimum Mean-Squared Error Estimation 

The set of complex random variables can be viewed as a vector space over the com- 
plex numbers (see Section 1.2.4J ) . With the restriction of finite second moments — 
which is implicit through the remainder of this section — this set forms a Hilbert 
space under the inner product (1.33) : 

(x, y) = E[xy*]. 

The square of the norm of the difference between vectors in this vector space is the 
mean-squared error (MSE) between the random variables: 

||x-xf = E[|x-x| 2 ]. (1.65) 

Since minimizing MSE is equivalent to minimizing a Hilbert space norm, many 
minimum MSE (MMSE) estimation problems are solved easily with the projection 
theorem. Throughout this section, MMSE estimators are called optimal, whether 
or not the estimator is constrained to a particular form. 

Linear Estimation Let x and yi, V2, . . . , yx be jointly-distributed complex ran- 
dom variables. A linear estimator^ of x from the yfcS is a random variable of the 
form 

x = a + aiyi + a 2 y 2 H Ya K y K . (1.66) 

When the coefficients are chosen to minimize the MSE, the estimator is called the 



linear minimum mean-squared error (LMMSE) estimator. Since ( 11.66) places x in 
a closed subspace S of a Hilbert space of random variables, the projection theorem 
dictates that the optimal estimator x must be such that the error x — x is orthogonal 
to the subspace: x — x ± S. 



13 A function of the form f(xi,X2, , x K ) = «o + Q1X1 +02^2 + • • 4%%, where the a^s are 

constants, is called affine. The estimator in (11.66| ) is called linear even though x is an affine function 
°f yii v 2i • ■ • 1 Yk because x is a linear function of 1, yi, ya, .. . , yx, and 1 can be regarded as a 
random variable. 
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Instead of trying to express that x — x is orthogonal to every vector in 5, it 
suffices to write enough linearly-independent equations to be able to solve for the 
afcS in ( ]1.66j ). Since constant random variables are in S (by setting ai = oti = • • • = 
&K = 0), we must have 



= (x-x, 1) ( =' E[x-x] ( = } E[x]-E[x] 



(<■') 



E[x] - (oo + oiE[yi ] + a 2 E[y 2 ] + •■■ + a K E[y K ]) , (1.67a) 



where (a) follows from the desired orthogonality; (b) from ( 11.33J ); and (c) and (d) 
from linearity of the expectation. We also have that each yfc is in S, so by analogous 
steps 

= (x-x,y fe ) = E[(x-x)y£] = E[xy£ ] - E[xy£ ] 

= E[xy^ ] - (oo + aiE[y iy ^ ] + a 2 E[y 2 y£ ] + ■■• + a K E[y K y* k ]), (1.67b) 



for k = 1, 2, . . . , K. Equations fll.67) can be rearranged using a matrix-vector 
product as 



1 E[yi] E[y 2 ] 

E[yJ] E[| yi | 2 ] E[y 2 yf] 

E[y5] E[yjy3] E[|y 2 | 2 ] 

E[yfc] E[y iy ^] E[y 2 y^] 



E[yx] ' 
E[yxyi] 
E[yxy3] 

E[|yx| 2 ]. 





a 




- E[x] ■ 




ai 




E[x Yl ] 




a 2 


= 


E[xy5] 




_a K _ 




E[xy^]. 



(1.68) 



This system of equations will usually have a unique solution. The solution fails to 
be unique if and only if {1, yi, . . . , yx} is a linearly dependent set. In this case, 
( | 1.681) will have multiple solutions {afc}^L , but all the solutions yield the same 
estimator. 

It is critical in this result that the set of estimators form a subspace, but this 
does not mean that the estimator must be an affine function of the observed data. 
For example, the estimator of x from a single scalar y 



x = A) + fry + &y 2 + ■ ■ • + p K y K 



(1.69) 



fits the form of ( j 1.661 ) with y k set to y k . Thus, assuming x has finite second moment 
and y has finite moments up to order 2K, ( ]1.68j ) can be used to optimize the 
estimator. Assuming y is real, ( ]1.68|) simplifies to 



1 E[y] 

E[y] E[y 2 ] 

E[y 2 ] E[y 3 ] 



E[y 2 ] 
E[y 3 ] 
E[y 4 ] 



E[y*] Eiy^ 1 ] E[y*+ 2 ] 



E[y^ +1 ] 
E[y*+ 2 ] 



E[y 



IK- 



Pi 

02 



n< 



E[x] 

Efxy 1 ] 
E[xy 2 ] 



E[xy 



K ■ 



(1.70) 
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Figure 1.17: Minimum mean-squared error estimators of x from y from Examples 1 1.24 
and |1.25l The joint distribution of x and y is uniform over the shaded region. The optimal 
estimates of the form xi = ao + otiy (dashed line) and X2 = 0a + /3iy + /32y (solid curve) 
are derived in Example 11.241 The red curve is the optimal estimator |(1 + ^/y) derived in 
Example 1.251 



Example 1.24 (Linear MMSE estimators) Suppose the joint distribution 
of x and y is uniform over the region shaded in Figure 1.171 Since the area of 
the shaded region is 1/3, the joint PDF of x and y is 



/x,y(s,t) 



3, se [0,1] andte [0, s 2 ]; 
0, otherwise. 



We wish to find estimators of x from y 

xi = «o + aiy and 

x 2 = Po + Iky + fay 2 

that are optimal over the choices of coefficients {o.q, «i} and {/3n, /9i, fa}- 

To form the system of equations ( 11.68) , we make the following computa- 
tions: 



E[x] 

E[y] 

E[xy] 

e[y 2 ] 



/o /o 3sdtds = 


3 

4' 


Jo So 3 tdtds = 


3 

10' 


Jo Jo Sstdtds = 


1 
" 4' 



f l J 3 3t 2 dtds 



Then {ao, a±} is determined by solving 

1 3/10' 
3/10 1/7 



3/4' 

1/4 
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to obtain «o = 45/74 and ai = 35/74. The estimate xi is shown as a function 
of the observation y = t by the dashed line in Figure 1.171 

To find {/3q, @u P2}, we require three additional moments: 



E[xy 2 ] = J^ff3st 2 dtds 
E [y 3 ] = Jolf^dtds 



j_ 

12' 



E 



[y 4 ] = lolf^dtds = & 



1/7" 




\Po~ 




"3/4" 


1/12 




Pi 


= 


1/4 


3/55_ 




[02 




LvsJ 



The system of equations ( j 1.681 ) becomes 

1 3/10 
3/10 1/7 
1/7 1/12 

which yields O = 12915/22558, 0i = 17745/22558, and 2 = -4620/11279. The 
estimate X2 is shown as a function of the observation y = t by the solid curve in 
Figure 11.171 

General Optimal Estimation The fact that estimators of the form ( 11.691 ) form a 
subspace hints at a more general fact: the set of all functions of a random variable 
form a subspace in a vector space of random variables. While this may seem sur- 
prising or counterintuitive, verification of the properties required by Definition 1.2 
is trivial. The subspace of functions of a random variable is furthermore closed, so 
several properties of general (not-necessarily-linear) MMSE estimation follow from 
the projection theorem. 

As the simplest example, the constant c that minimizes E[ (x — c) 2 ] can be 
interpreted as the best estimator of x that depends on nothing random. We must 
have 

® (x-c, 1) ( => E[x-c] ( = ] E[x]-c, 

where (a) follows from the orthogonality of the error x — c to the deterministic 
function 1; (b) from ( 11.331 ); and (c) from linearity of the expectation. This derives 
the well-known fact that c = E[x] is the constant that minimizes E[ (x — c) 2 ]; see 
Appendix 1.C.31 

Now consider estimating x from observation y = t. Conditioning yields a valid 
probability law, so 

(x, z) = E[xz*|y = t] (1.71) 

is an inner product on the set of random variables with finite conditional second 
moment given y = t. Orthogonality of error x — xmmse(^) and any function of t 
under the inner product ( 11.711 ) yields 



( => E[x 



xmmse(£) |y = t] = E[x|y = t] -xmmse(*), 



where (a) follows from considering specifically the function 1; and (b) from xmmse(^) 
being a (deterministic) function of t. Thus, the optimal estimate is the conditional 
mean: 

xmmseW = E[x|y = t], (1.72a) 
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which is also written as 

xmmse = E[x|y] . (1.72b) 



Example 1.25 (MMSE estimator, Example 11.241 cont'd) Consider x and 
y jointly distributed as in Example 11.241 (see Figure [1.1 7) . Given an observation 
y = t, the conditional distribution of x is uniform on [y/t, 1]. The mean of this 
conditional distribution gives the optimal estimator 



xmmse(<) = 5 ( 1 + \/i\ 



This optimal estimate is shown as a red curve in Figure 1.171 

Orthogonality and Optimal Estimation of Random Vectors 

Use of the inner product ( 1 1.33} ) has given us a geometric interpretation for scalar 
random variables with valuable ramifications for optimal estimation. One can define 
various inner products for random vectors as well. However, we will see that more 
useful estimation results come from generalizing the concept of orthogonality rather 
than from using a single inner product. 

One valid inner product for complex random vectors of length N is obtained 
from the sum of inner products between components of the vectors, (x„, y„), using 
the inner product ( 11.33) between scalar random variables: 

N-\ 

(x, y) = Yl E [ x ™y«]- 

n=0 

This is identical to the expectation of the standard inner product on C^, ( jl.20a| ), 
or (x, y) =E[y*x]. 

With the projection theorem, one could optimize estimators of x from y' 1 ^, y' 2 ', 
. . . , y( ) of the form 

x = a l + «iy (1) + a 2 y (2) H \-a K y (K) , (1-73) 

where every component of 1 G C N is 1. This is exactly as in ( 11.66), but now 
each element of {x, y' 1 ', y*- 2 ^ . . . , y' '} is a vector rather than a scalar 1 14 | Optimal 
coefficients are determined by solving a system of equations analogous to ( 11.68) . 

A weakness of the estimator x in ( 11.73) is that any single component of x 
depends only on the corresponding components of {y' 1 ', y' 2 ' . . . , y^}; other de- 
pendencies are not exploited. To take a simple example, suppose x has the uniform 
distribution over the unit square [0, l] 2 and [yi y 2 l = [x 2 xil . The vector x can 
be estimated perfectly from the vector y, but yi is useless in estimating xi and y 2 
is useless in estimating x 2 . Thus estimators more general than ( 11.73) are commonly 
used. 



14 Superscripts are introduced so we do not confuse indexing of the set with indexing of the 
components of a single vector. 
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Linear Estimation Let x be a C^-valued random vector, and let y be a C M -valued 
random vector. Suppose all components of both vectors have finite second moments. 
Consider estimator of x from y given by| 15 | 

x = Ay, (1.74) 

where A G i^NxM - ls a cons ^ a nt matrix to be designed to minimize the MSE 
E[||x — x|| 2 ]. Note that unlike in ( 11.73) , every component of x depends on every 
component of y. 

Since each row of A determines a different component of x and the MSE 
decouples across components as 

N-l 

E[||x-xf] = 5>[|x„-x„| 2 ], 

n=0 

we can consider the design of each row of A separately. Then for any fixed n € 
{0, 1 . . . , N — 1}, the minimization of E[ |x„ — x„| 2 ] through the choice of the nth 
row of A is a problem we have already solved: it is the scalar linear MMSE estima- 
tion problem. The solution is characterized by orthogonality of the error and the 
data as in ( ] 1.671 ). 

The orthogonality of nth error component x„ — x„ and mth data component 
y m can be expressed as 

( = } E[(x„-x„)y; n ] ( => E[(x„-a^y)y^], 

where (a) follows from the inner product (1.33) ; and (b) introduces a J as the nth 
row of A. Gathering these equations for m = 0, 1, . . . , M — 1 into one row gives 

lxM = E[(x n -c£y)y*], n = 0, 1, . . ., N - 1. 

Now stacking these equations into a matrix gives 

OjvxM = E[(x-Ay)y*]. (1.75) 

Using linearity of expectation, a necessary and sufficient condition for optimality is 
thus 

E[xy*] = AE[yy*}. (1.76) 

In most cases, E[yy* ] is invertible; the optimal estimator is then 

XMMSE = E[xy*](E[yy*])- 1 y (1-77) 

When E[yy* ] is not invertible, solutions to ( 11.76) are not unique but yield the same 
estimator. 



Orthogonality of Random Vectors Inspired by the usefulness of ( 11.75) in optimal 
estimation, we define a new orthogonality concept for random vectors. 



15 In contrast to estimators (11.66) and (11.73) . we have omitted a constant term from the estima- 
tor. There is no loss of generality because one can augment y with a constant random variable. 
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Definition 1.32 (Orthogonal random vectors) Random vectors x and y 
are said to be orthogonal when E[xy* ] = 0. 



Note that E[xy* ] is not an inner product because it is not a scalar (except in the 
degenerate case where the random vectors have dimension 1 and are thus scalar 
random variables). Instead, random vectors x and y are orthogonal when every 
combination of components are orthogonal under inner product (1.33) : 

E[x„y*j] = for every m and n. 



In (1.75) we have an instance of a more general fact: Any time an estimator x 
of random vector x is optimized over a closed subspace of possible estimators S, the 
optimal estimator will be determined by x — x _L S under the sense of orthogonality 
in Definition 1 1.321 We will apply this to optimal LMMSE estimation of discrete-time 
random processes in Chapter [2} 

1.5 Bases and Frames 

The variety of bases and frames for sequences in £ 2 (1*) and functions in £ 2 (R) is 
at the heart of this book. In this section, we develop general properties of bases 
and frames, with an emphasis on representing vectors in Hilbert spaces using bases. 
Bases will come in two flavors, orthonormal and biorthogonal; analogously, frames 
will come in two flavors, tight and general. In later chapters, we will see that the 
choice of a basis or frame can have dramatic effects on the computational complexity, 
stability, and approximation accuracy of signal expansions. 

Prominent in the developments of Section 1.41 were closed subspaces: We saw 
best approximation in a closed subspace, projection onto a closed subspace, and 
direct sum decomposition into a pair of closed subspaces. In this section, we will 
see that a basis induces a direct sum decomposition into a possibly-infinite number 
of one-dimensional subspaces; and bases, especially orthonormal ones, facilitate the 
computations of projections and approximations. These developments reduce our 
level of abstraction and bring us closer to computational tools for signal processing. 
Specifically, Section 1.5.5 shows how representations with bases replace general 
vector space computations with matrix computations, albeit possibly infinite ones. 

1.5.1 Bases and Riesz Bases 

A basis is a set of vectors that is used to uniquely represent any vector in a vector 
space as a linear combination of basis elements. It is of minimal size in that it 
is a linearly independent set. The definition applies in any normed vector space, 
including any Hilbert space (where the norm is induced by an inner product). 

Definition 1.33 (Basis) The set of vectors $ = {tpk}k&c C V, where K, is finite 
or countably infinite, is called a basis for the normed vector space V when 
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(i) it is linearly independent] and 






(ii) it is complete in V, meaning 






V -- 


- span($). 


(1.78) 



In this definition, the closure of the span is needed to allow linear combinations with 
infinitely-many termsc 6 ] The completeness requirement implies that any x € H has 
an expansion with respect to the basis $ of the form 

x = ^J a k (p kl (1-79) 

keK 

while the linear independence requirement implies that the expansion is unique: 
Given another expansion x = ^2 k€f ^ Pk^Pk, subtracting it from ( jl.79| ) gives = 
SjtGic( afc ~~ ftk)^, so linear independence implies a k — fik = for all k e K,. The 
coefficients {atk}k£K are called the expansion coefficients 1 " 7 ] of x with respect to the 
basis $. 

From Definition 1.61 it is clear that when a vector space has finite dimen- 
sion N, its bases contain precisely N elements. The infinite-dimensional Hilbert 
spaces that we consider have countably-infinite bases because they are separable 
(see Section [1.3. 21 ). 

Example 1.26 (Standard basis for C n ) The standard basis for K 2 was in- 
troduced in Section [TTT1 It is easily extended to C N (or M. N ) with 

e k = [ ... 1 ... ] T , k = 0,l, ..., N -1. 



k Os (N~k-1) Os 

The set {ek} k S is both linearly independent and complete, and it is thus a 
basis. For completeness, note that any vector v = [vo v\ ... i>w_i] € C N 
is precisely the finite linear combination v = X^=o v k e k', the closure in ( jl.78| ) is 
not needed. 

Example 1.27 (Standard basis for C z ) The standard basis concept extends 
also to some normed vector spaces of complex-valued sequences over Z. Consider 
E = {efc}fc e z where e k G C z is the sequence that is except for a 1 in the k- 
indexed position. The set E is clearly linearly independent, but whether it is 
complete depends on the norm. 

The set E is complete under the £ p norm with p£ [1, 00), meaning that 
£P(Z) = spaS(E) when the closure of the span is defined with the £ p norm. 



16 Recall that the span of an infinite set of vectors is defined as the set of all finite linear 
combinations (Definition 1 1. 4} . 

"Expansion coefficients are sometimes called Fourier coefficients or generalized Fourier coeffi- 
cients, but we will avoid these terms except when the expansions are with respect to the specific 
bases that yield the various Fourier transforms. They are also called transform coefficients or 
subband coefficients in the source coding literature. 
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To show this, we establish span(E) C f p (Z) and ^ P (Z) C span(E). The first 
inclusion, span(E') C £ p (Z), holds because £ P (Z) is a complete vector space; see 
Section [1.3.21 It remains to show £ P (Z) C span(S). The meaning of x € f p (Z) is 
that X^nez \ x n\ p is convergent. Thus, 




n=M+l 



This limit shows that the sequence yu = 2 n =-M x n£n, M = 0, 1, . . ., converges 
to x under the £ P (Z) norm. Since every yu is in span(i?), the limit of the sequence 
x is in span(E). Thus £ P {Z) C span(£). 

Changing the norm changes the meaning of the closure of the span and 
thus can make E not be a basis. The set E is not complete under the £°° norm 
since ^°°(Z) <2 span(£"). To show this, let x be the all Is sequence, which is in 
^°°(Z). Every sequence in span(£") has a finite number of nonzero entries and is 
therefore at least distance 1 from x under the £°° norm. Therefore x is not in 
span(_E). 

Riesz Bases While the previous example provides an important note of caution on 
the meaning of a basis, dependence on the choice of norm is not our focus. In fact, 
we will primarily focus on the £ 2 and C 2 Hilbert space norms. The more important 
complication with infinite-dimensional spaces is that a basis can be prohibitively ill- 
suited to numerical computations. Specifically, it is not practical to allow coefficients 
in a linear combination to be unbounded or to require very small coefficients to be 
distinguished from zero; the concept of a Riesz basis restricts bases to avoid these 
pitfalls. 



Definition 1.34 (Riesz basis) The set of vectors $ = {<Pk}keic C H, where /C 
is finite or countably infinite, is called a Riesz basis for Hilbert space H when 

(i) it is a basis for H; and 

(ii) there exist strictly positive constants A m i n and A max such that, for any x in 
H, the expansion of x with respect to the basis <&, x = J2keic a k l Pk, satisfies 

A mi „ |M| 2 < J2 \ a *\ 2 ^ A — INI 2 - (1-80) 

fce/c 



In C N or £ 2 (Z), the standard basis is a Riesz basis with A m ; n = A max = 1 (see Exer- 
cise ELSE)- Conversely, Riesz bases with A m j n = A max = 1 are orthonormal bases, as 
developed in Section [1.5. 21 As we introduce a variety of bases for different purposes, 
it will be a virtue to have A m i n « A max , though we may relax this requirement to 
achieve other objectives. 
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Example 1.28 (Riesz bases in K 2 ) Any two vectors ip and ipi are a basis for 
K 2 as long as there is no scalar a such that ip± = apo- We fix ipo = eo and vary 
ip± in two ways to illustrate deviations from the standard basis: 



Let p\ = ae\ with a G (0, 00), as illustrated in Figure [1.18( a). The unique 
Xi\ is then 

x = x tp + (xi/a)ipi = ato<po + ai<pi. 



expansion of [ccrj Xi\ is then 



The largest A m i n such that ( 11.80) holds is 

x ia = mf x ° + (xi/a)2 I 1] forae (M; 

xeR 2 xl + xf \ 1/a 2 , for a G (l,oo). 

This means that by making a very large, the basis becomes numerically 
ill-conditioned in the sense that there are nonzero vectors x with very small 
expansion coefficients. 

Similarly, the smallest A max such that ( 11.80) holds is 

x 2 + (xi/a) 2 / 1/a 2 , for a G (0, 1]; 



Amax — SUp 



X, 







1, for a G (1, 00). 



This means that by making a close to zero, the basis becomes numeri- 
cally ill-conditioned in the sense that there are vectors x with very large 
expansion coefficients. Figure [1.18( c) shows A m i n , A max as functions of a. 

(ii) Let ipi = [cosO sin#] with 8 G (0, n/2], as illustrated in Figure [TTT8l fb). 

The unique expansion of \xo aJi] is then 

x = (xq — cot 9 xi)ipo + (csc6 xi)ipi = aotpo + aicpi. 

Using trigonometric identities, one can show that the largest A m i„ and 
smallest A ma x such that ( j 1.801 ) holds are 

A min = isec 2 (0/2) and A max = lcsc 2 {6/2), 

shown in Figure 1.18( d) as functions of 9. The numerical conditioning is 
ideal when 9 = ir/2, in which case {^Oj Pi} is the standard basis, while it 
is extremely poor for small 9, resulting in very large expansion coefficients. 

The previous example illustrates two ways of deviating from the standard basis: 
lacking unit norm and lacking orthogonality. While the effects of these deviations 
can make numerical conditioning arbitrarily bad, any basis of a finite-dimensional 
Hilbert space is a Riesz basis. (In the first part of the example, a must be nonzero 
for {^3oi Pi} t° be a basis, and this keeps A m m strictly positive and A max finite. 
Similarly, in the second part 9 must be nonzero, and this keeps A max finite.) The 
following infinite-dimensional examples show that some bases are not Riesz bases. 

Example 1.29 (Bases that are not Riesz bases) 
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^mini ^ 



mm? ''max 




(c) 



(d) 



Figure 1.18: Two families of bases in R 2 that deviate from the standard basis 
{eo, ei} and their Riesz basis stability constants Amin (solid) and A max (dashed), (a) 
(pi is orthogonal to ipo but not necessarily of unit length, (b) ipi is of unit length 
but not necessarily orthogonal to ipo . (c) Amin and A max for the basis in (a) as a 
function of a. (d) A m i n and A max for the basis in (b) as a function of 6. 

(i) Consider the following scaled version of the standard basis in I 2 (Z) : 

<p k = (l + HD-^fc, fcez. 

The ratio of lengths of elements Hvfcll/llvoll is unbounded, so this is intu- 
itively similar to letting a — > or a — > oo in Example 1.2§ Ii) The set 
$ = {<Pk}kez is a basis for ^ 2 (Z), but it is not a Riesz basis. 

To prove that $ is not a Riesz basis, we show that no finite A max 
satisfies ( 11.801 ) . Suppose there is a finite A max such that (11.801 ) holds for all 
x € £ 2 (Z). Then we can choose an integer M > \/X max and let x G £ 2 (Z) 
be the sequence that is except for a 1 in the M-indexed position. The 
unique representation of this x using the basis $ is x = (1 + |M|)</?m; that 
is, the coefficients in the expansion are 



a k 



1 + |M|, forfc = Af; 
0, otherwise. 



The second inequality of ( 11.80J ) is contradicted, so there does not exist the 
desired finite A max - 
(ii) Consider the following vectors defined using the standard basis in ^ 2 (N): 

k 

fk = ^(fc + ir 1/2 e fe , fceN. 
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The angle between consecutive elements approaches zero as k — > 00 because 
((fk+i, <pk) — » ||vfc+i|| llVfcll = 1; so this is intuitively similar to letting 6 — > 
in Example 11. 2£|(ii) [ Proving that the set $ = {<fik}k€fi is a basis for f 2 (N) 
but is not a Riesz basis is left for Exercise 1.361 

In the subsequent developments, it will be desirable for all bases to be Riesz bases. 

Operators Associated with Bases Given a Riesz basis {ipk}k£ic, the expansion 
formula ( jl.791 ) can be viewed as mapping a coefficient sequence a to a vector x. 
This mapping is clearly linear. Let us suppose that the coefficient sequence has 
finite £ 2 (K.) norm. The first inequality of ( ]1.80[ ) implies that the vector x given by 
( jl.791 ) has finite norm, at most ||a||/\/A m i n , and is thus legitimately in H . 



Definition 


1.35 (Basis : 


SYNTHESIS 


. operator) 


Given a 


Riesz basis 


{¥>k}ketc, 


the 


synthesis 


: operator associated with it is 
















$ : £ 2 {K) 


-#. 


with 


$<a 


= 


k£K 


^Wk- 


(1.81) 



The second inequality of ( j 1.801 ) implies that the norm of this linear operator is at 
most vAmax; the operator $ is thus not only linear but bounded as well. 

The adjoint of $ maps from H to a sequence in £ 2 (IC). To derive the adjoint, 
consider the following computation for arbitrary a € l 2 (K.) and y G H: 

(®a,y) = (^r,ak<Pk,v) = ^2^k((fik,y) = ^2a k (y, ipk}*, 

k<£K k£K k£K 

where (a) follows from ( 1 1 . 8 1 [ ) ; (b) from the linearity in the first argument of the 
inner product; and (c) from the Hermitian symmetry of the inner product. The final 
expression is the £ 2 (IC) inner product between a and a sequence of inner products 
{(y, tpk)}keK- The adjoint is called the analysis operator: 




Equation [L82] holds since ($0, y) h = (&, &*y)e 2 - The norm of the analysis operator 
is also at most vAmax because ||A|| = ||A*|| for all bounded linear operators A. 

1.5.2 Orthonormal Bases 

An orthonormal basis is a basis of orthogonal, unit-norm vectors: 
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Definition 1.37 (Orthonormal 


BASIS) 


The set of vectors $ = 


Wk}k£K C H, 


where K, is finite or countably infinit 


3, is called an orthonormal basis for the Hilbert 


space H when 








(i) it is a basis for H ; and 








(ii) it is orthonormal, 








((p i: <p k ) = 


Si-k 


for every i, k £ /C. 


(1.83) 



Since orthonormality implies linear independence, we could alternatively say that 
a set $ = {<Pk}keK C H satisfying ( 11.831 ) is an orthonormal basis whenever it is 
complete, that is, span($) = H. 

Standard bases are orthonormal bases. Two more examples follow, and we 
will see many more examples throughout the book. 

Example 1.30 (Finite-dimensional orthonormal basis) The vectors 



y"o 



1 

71 



<pi 



i 

76 



and ip2 



1 

71 



are orthonormal, as can be verified by direct computation. Since three linearly 
independent vectors in C 3 always form a basis for C 3 , {(fio, <fii, ^2} is an or- 
thonormal basis for C 3 . 

Example 1.31 (Orthonormal basis of cosine functions) Consider $ = 
{v?fc}fceN C C 2 ([— i, h]) defined in ( 11.21) . The first three functions in this set were 



shown in Figure 1.51 Example |1.6|(iii) showed that $ satisfies the orthonormality 
condition ( ] 1.831 ). Since orthonormality also implies linear independence, $ is an 
orthonormal basis for S = span($). (Remember that S is itself a Hilbert space.) 
The set $ is not, however, an orthonormal basis for C 2 ([— h, \\) because S is a 
proper subspace of C 2 ([— 4, i]); for example, no odd functions are in S. 



Expansion and Inner Product Computation Expansion coefficients with respect 
to an orthonormal basis are obtained by using the same basis for signal analysis. 



Theorem 1.38 (Orthonormal basis expansions) Let $ = {ipk}ke>c be an 
orthonormal basis for Hilbert space H. The unique expansion with respect to $ 
of any x in H has expansion coefficients 



&k 
a 



(x, <fk) 
$* x. 



for k G K, or, 



(1.84a) 
(1.84b) 
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Synthesis with these coefficients yields 

x = 2_j(x, Vk)<Pk 
= $a = <&<&* x. 


(1.85a) 
(1.85b) 



Proof. The existence of a unique linear combination of the form ( 11.791) is guaranteed 
by $ being a basis. The validity of ( ]1.85af ) with coefficients ( )1.84a[) follows from the 
following computation: 



i£}C iG/C i£LK, 

where (a) follows from (1.79) ; (b) from the linearity in the first argument of the inner 
product; (c) from the orthonormality of the set $, (1.83) ; and (d) from the definition 
of the Kronecker delta sequence, (]1.9[ ). 

The expressions (1.84b) and ( [1.85b] ) are equivalent to (1.84a) and (11. 85a) using 
the operators defined in ( 11.811) and (11.82 j) . 



Since (1.85b) holds for all x in H, 

$$* = I on H. (1.86) 

This leads to the frequently-used properties^ 8 ! in the following theorem: 



Theorem 1.39 (Parseval's equalities) Let $ = {<Pk}keK 


be 


an orthonormal 


basis for Hilbert space H. Expansion with coefficients (11.840 


satisfies 




||xf = ]TK^>I 2 






(1.87a) 


keK. 








= ll^zll 2 = INI 2 - 






(1.87b) 


More generally, 








(x, y) = ^2(x, fk)(y, fk)* 






(1.88a) 


k£K 








= ($*x, $*y) = {a, (3). 






(1.88b) 



Proof. Recall the equivalence of ( 11.50] ) and (1.52a) . Thus (1.86) is equivalent to (T 



Setting x = y in (1.88b) yields (1.87b) . Equalities (1.87a[) and (1.88a) are the same 
facts expanded with the definition of $*. 

Proving this using operator notation and properties in (1.86) is much less tedious 
than the direct proof. To see this on the example of (1.88) , write: 



(z,y) = {^2(x, <Pk}<fk,y} - ^2(x, ip k }{(p k , y) = ^2(x,(p k }{y, ip k }*, 



(%2(x, <Ph)tpk, y) - ^2(x,ip k }((p k ,y) = 

feeK k€K. k€K. 



18 The first of these, ( |1.87a[ ), is the one most often referred to as the Parseval's equality. 
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where (a) follows from expanding x with (|1.85a|) ; (b) from the linearity in the first 
argument of the inner product; and (c) from the Hermitian symmetry of the inner 
product. 

The simple equality ( ]1.88[ ) captures an important role played by any orthonormal 
basis: it turns an abstract inner product computation into a computation with 
sequences. When x = YlkeK a k L Pk and y = Ylk^K fik^Pk as in the theorem, 

(x,v)h = (a, P)p{k) = ^2 a kPi, (1-89) 

k£K 

where the final computation is an £ 2 (/C) inner product even though the first inner 
product is in an arbitrary Hilbert space. We may view this more concretely with 
matrix multiplication as 

(x,y) = 13* a, (1.90) 

where a and f3 are column vectors. 

Example 1.32 (Inner product computation by expansion sequences) Let 

a and (3 be sequences in £ 2 (N). Then the functions 

00 

x(t) = a + ^a fc v / 2cos(27rH), (1.91a) 

fc=i 
00 

y (t) = /3 + ^/3 fc v / 2cos(27rH), (1.91b) 

fc=i 

are in C 2 ([— i, |]), and their inner product can be simplified as follows: 
(x, y) = / \a + y" a k \^2cos(2nkt) ) [ (3* + Y" /3 £ * \/2cos(27rft) ) dt 

J -v> \ h / V 1=1 ) 

00 „l/2 _ 00 „i/2 _ 

= a (3l + a V PI \ v / 2cos(2Tr£t)dt+(3*y2a k V2 cos(2ir kt) dt 

fci /-i/a . fe=i /-1/2 f 

= 





00 00 

fe=i e=i 

00 00 

fc=l fcl 

(a, /?), 


/•1/2 

/ 2 ' 

J -1/2 


= 

cos(27rfci) cos(2tt&) dt 


(A) 


akPi 5k- 


— $k-£ 

-1 = a p 


OO 

+ E afc 
fe=i 



fe=0 



where (a) follows from the definition of the Kronecker delta sequence, ( 11.9) : and 
(b) from the definition of the £ 2 inner product between sequences. 

Recalling the orthonormal basis from Example 1.311 what we have shown 
is that a more complicated (integral) inner product in £ 2 (R) can be computed 
via a simpler (series) inner product between expansion coefficients in £ 2 (Z). 
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Figure 1.19: Conceptual illustration of the isometry between a separable Hilbert space 
H and sequence space £ 2 (JC) induced by an orthonormal basis $. It preserves geometry as 
{x, y) = (a, P). 



Unitary Synthesis and Analysis We will show 

$*$ = / on 1 2 {K). 



(1.92) 



Combined with ( 11.861 ) . this establishes that the analysis and synthesis operators 
associated with an orthonormal basis are unitary. 

To verify ( 11.92J) . make the following computation for any sequence a in £ 2 (1C): 

(d) 



' keK 



ieK 



keK 



{OLk}k 



keK 



a. 



(1.93) 



where (a) follows from ( 1 1 . 8 1 [ ) ; (b) from ( 11.82) ; (c) from the linearity in the first 
argument of the inner product; (d) from orthonormality the set {tpk}keK, ( 1 1 - 8 3 [ ) : 
and (e) from the definition of the Kronecker delta sequence, ( 1 1 . 9 [ ) . 

Isometry of Separable Hilbert Spaces and £ 2 (JC) The fact that the synthesis 
and analysis operators <J> and $* associated with an orthonormal basis are unitary 
leads to key intuitions about separable Hilbert spaces. A unitary operator between 
Hilbert spaces puts Hilbert spaces in one-to-one correspondence while preserving the 
geometries (that is, inner products) in the spaces. Since Hilbert spaces that we con- 
sider are separable, they contain orthonormal bases. Therefore, these Hilbert spaces 
can all be put in one-to-one correspondence with C N if they are finite-dimensional 
or with £ 2 (Z) if they are infinite-dimensional, as illustrated in Figure [1.191 (The 
notation £ 2 (IC), with /C finite or countable infinite, unifies the two cases.) 

Orthogonal Projection Truncating the orthonormal expansion (1.85a[ ) to k € I, 

where X is an index set that is a subset of the full index set K,, T C /C, gives 
an orthogonal projection. As in the alternate proof of Theorem 11.391 this can be 
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verified through somewhat tedious manipulations of sums and inner products; it is 
simpler to extend the definitions of the synthesis and analysis operators to apply 
for I <Z K, and then use these new operators. 

Define the synthesis operator associated with {<£k}kei as 

§i:l 2 {X)^H, with $ia = ^a^i. (1.94) 

fcex 

This follows the form of ( 11.81) exactly, but the subscript I emphasizes that {<fk}kei 
may not be a basis. The adjoint of $1, called the analysis operator associated with 
{</?fc}fcei, is 

$J : H -> £ 2 {X), with (®ix) k = {x, ip k ), kel. (1.95) 

Following the same steps as in ( j 1.931 ), the orthonormality of the set {(pk}kei is 
equivalent to 

$J$i = I on £ 2 {2). (1.96) 

However, we cannot conclude that <5>x and 3>J are unitary because the product 
$1 $j is not, in general, the identity operator on H; $x not being a basis, it cannot 
reconstruct every x € H . Instead, $1$^ is an orthogonal projection operator that 
is an identity only when X = /C, that is, when {ifk}kei is a basis. This is formalized 
in the following theorem: 

Theorem 1.40 Given an orthonormal set $ = {^>k}kei C H, 

P x x = ^2{x, <Pk)<Pk (1.97a) 

fcGl 

= $i$Jx (1.97b) 

is the orthogonal projection of x onto Sj = span({</?/T fe6 j). 



Proof. From its definition (|1.97a[) , Px is clearly a linear operator on H with range 
contained in Si. To prove that Px is an orthogonal projection operator, we show that 
it is idempotent and self-adjoint (see Definition 11. 27\i . 

The operator Px is idempotent because, for any x £ H, 

Px{Pxx) = $1 $J ($1 $J x) = $i($J$i)$Jx 
C J * x $lx ^ Pxx, 



where (a) follows from ( |1.97b|) ; (b) from associativity of linear operators; (c) from ( 11.961 ); 
and (d) from ([1.97b] ). This shows that Px is a projection operator. The operator Px is 
self-adjoint because 

Pi = ($z$j)* = ($j)*$j ( = $i$j = Px, 

where (a) follows from Theorem |1.21](viii)] and (b) from Theorem |1.2l|(iii) [ Combined 
with the previous computation, this shows that Px is an orthogonal projection operator. 
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The previous theorem can be used to simplify the computation of an orthogonal 
projection provided {<Pk}kei is an orthonormal basis for the subspace of interest. 

Example 1.33 (Orthogonal projection with orthonormal basis) Consider 
the orthonormal basis for C 3 from Example 1.301 The 2-dimensional subspace 

S = l[x Q x\ x 2 ] gC 3 I x\ = x + x 2 j 

is span({(^o, <^i})- Therefore, using ( |1.97af ) , the orthogonal projection onto S is 

given by 

1 

k=a 
To see explicitly that this is an orthogonal projection operator: 

P S X = (x, tp Q )ip Q + (x, (p^ifi! = ip (x, tp Q ) +(fi{x, ipi) 
= (Pq<PqX + ipiiplx = (ipo<p* + ipiifl) x 



2 1 -1 

1 2 1 

-1 1 2 



where (a) follows from the inner product being a scalar; (b) from writing the 
inner product as a product of a row vector and a column vector; and (c) from the 
distributive property of matrix-vector multiplication. The matrix representation 
of Ps is idempotent (as can be verified with a straightforward computation) and 
obviously Hermitian. 

Orthogonal Decomposition Given the orthogonal projection interpretation from 
Theorem 1.401 term k in the orthonormal expansion formula (1.85a[ ) is the orthogo- 
nal projection of x to the subspace St^x = span^^) (recall also Example 11.191 ). So 
( |1.85a| ) writes any x uniquely as a sum of its orthogonal projections onto orthogonal 
1-dimensional subspaces {Ss^keK- in other words, an orthonormal basis induces 
an orthogonal decomposition 

H = 05 {fe} (1.98) 

k£K 

while providing a simple way to compute the components of the decomposition 
of any x € H . The expansion formula (1.85a) will be applied countless times in 
subsequent chapters, and orthogonal decompositions of Hilbert spaces £ 2 (Z) and 
£ 2 (M.) will be a recurring theme. 



Best Approximation The simple form of (1.97a[ ) makes certain sequences of or- 
thogonal projections extremely easy to compute. Let x^ ' denote the best approx- 
imation of x in the subspace spanned by the orthonormal set {<po, f\i ■ ■ ■ , ^fc-i}- 
Then x^ ' = and 



x 



CH-i) = £« + fa Vk fy k forfc = 0, 1, ..., (1.99) 
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that is, the new best approximation is the sum of the previous best approximation 
plus the orthogonal projection onto the span of the added vector ifk- This follows 
from the projection theorem (Theorem |1.26[ ) and comparing ( jl.97aj ) with index sets 
{0, 1, ..., k- 1} and {0, 1, . . . , k}. 

The recursive computation Q 1.991 ) is called successive approximation; it arises 
from the interest in nested subspaces and having orthonormal bases for those sub- 
spaces. Nested subspaces arise in practice quite frequently. For example, suppose 
we wish to find an approximation of a function x by a polynomial of minimal degree 
that meets an approximation error criterion. Then if {ifk}keN is an orthonormal 
set such that, for each M , {ipo, tpi, . . . , Pm} is a basis for degree-M polynomials, 
we can apply the recursion ( 11.99) until the error criterion is met. Gram-Schmidt 
orthogonalization, discussed below, is a way to find the desired set {<fik}keN, and 
approximation by polynomials is covered in detail in Section 5.21 



Bessel's Inequality While Bessel's inequality is similar to Parseval's equality ( 11. 87a) 
it holds for any orthonormal set — even if that set is not a basis. When it holds with 
equality (for all vectors in a Hilbert space, giving Parseval's equality), the orthonor- 
mal set must be a basis. 



Theorem 1.41 (Bessel's inequality) Given an orthonormal set $ = 


= Wk}kei 


in a Hilbert space H, Bessel's inequality holds: 








IMI 2 > El<^>l 2 






(1.100a) 


fcex 








= ll^ll 2 - 






(1.100b) 


Equality for every x in H implies that the set $ is 


complete in 


H, 


so the or- 


thonormal set is an orthonormal basis for H\ (11.1001) 


is then Parseval 


's equality 


(11.871). 









Proof. Let S — span('l>) and xs — &x&x x - By ( |l-97b|) , xs is the orthogonal projection 
of x onto S. Thus, by the projection theorem (Theorem 11.261) , x — x.s J- xs- From this 
we conclude 

II II 2 O) II ||2 . || ||2 V?' || ||2 mi 1* |,2 ( c ) ||** ||2 (<0 r-»|, ,,2 

IfII — \\ x s\\ + \\ x - x s\\ > \\%s\\ = ||*i*xx|| = ||*xa;|| = ) y \{x, <pk)\ , 

where (a) follows from the Pythagorean theorem ( ]1.26a|) ; (b) from the nonnegativity 
of the norm of a vector; (c) from (11.96 j ); and (d) from the definition of the analysis 
operator, ( 11.951) . 

Step (b) holds with equality for every x in H if and only if x = xs for every x in 
H. This occurs if and only if S = H , in which case we have that the set $ is complete 
and thus an orthonormal basis for H . 

For the case when the orthonormal set {<fk}kei is n °t complete, Bessel's inequality is 
especially easy to understand by extending the set to an orthonormal basis {<Pk}ketC 
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X 


<fi2 — e 2 






w 


fo = e 




tpi = ei 


1 (x, Ifi!) 


(x, fo) 




XQ1 



Figure 1.20: Illustration of Bessel's inequality in 



with K D X. Then Bessel's inequality follows from Parseval's equality because 
Ylkei \( x > fk)\ 2 simply omits some nonnegative terms from J2keK \( x ' Vk)] 2 ■ The 
following example illustrates this in K 3 . 

Example 1.34 (Bessel's inequality) Let ip = [l 0] and^i = [0 1 0] 
These vectors are the first two elements of the standard basis in R 3 , and they 
are orthonormal. As illustrated in Figure 1.201 the norm of a vector x G K 3 is at 
least as large as the norm of its projection onto the (</?0j V ? i)-plane, xq\\ 



Nl 2 > ll^oill 1 



\(x, <po)\ 2 + \(x, pi)| s 



Adding ip% = [0 l] to the set gives an orthonormal basis (the standard 
basis), and adding the square of the length of the orthogonal projection of x 
onto the span of <^2 yields Parseval's equality: 



l*H S 



fe=0 



\{x, Vu)f 



Gram-Schmidt Orthogonalization We have thus far discussed properties of or- 
thonormal bases and checked whether a set is an orthonormal basis. We now show 
how to construct an orthonormal basis for a space specified by a set of linearly 
independent vectors {xk}keK C H. For notational convenience, assume K, is a set 
of consecutive integers starting at 0, so K. = {0, 1, . . . , N — 1} or K, = N. 
The goal is to find an orthonormal set {(pk}keic with 



span({tp k } ke K.) = span({x k } k eic) 



(1.101a) 



Thus, when {xk}keK is a basis for H , the constructed set {^>k}keK is an orthonor- 
mal basis for H; otherwise, {<fk}keK is an orthonormal basis for the smaller space 
span({:rfc}fcejc), which is itself a Hilbert space. 

There are many orthonormal bases for sp&n({xk}keic)- By requiring a stronger 
condition 



span({<p fc } l fe=0 ) = span({x fe }fc =0 ) for every i E N, 



(1.101b) 
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Figure 1.21: Illustration of Gram-Schmidt orthogonalization. (a) Input vectors 

{xq, xi}. (b) The first output vector ipo is a normalized version of xq. (c) The pro- 
jection of xi onto the subspace spanned by <po is subtracted from x\ to obtain a residual 
aJi — vi. (d) The second output vector <pi is a normalized version of the residual. 



the solution becomes essentially unique. Furthermore, enforcing ( ll.lOlbj ) for in- 
creasing values of i leads to a simple recursive procedure. Figure 1.21 illustrates 
the orthogonalization procedure for two vectors in a plane (initial, nonorthonormal 
basis). For example, for i = 0, ( ll.lOlbj ) holds when tpo is a scalar multiple of Xo- 
For ifo to have unit norm, it is natural to choose 

ipo = x /\\x \\, 



as illustrated in Figure 1.21( b), and the set of all possible solutions is obtained 
by including a unit-modulus scalar factor. Then for fll.lOlbj ) to hold for i = 1, 
the vector ipi must be aligned with the component of X\ orthogonal to (po, as 
illustrated in Figure [1.21( c). This is achieved when Lp\ is a scalar multiple of the 
residual of orthogonally projecting x\ to the subspace spanned by (po, as illustrated 
in Figure [1.21( d): 

xi - (xi, ip )(fo 

\\xi - (xi, <po)<po\\' 
In general, (fk is determined by normalizing the residual of orthogonally project- 
ing Xk to span({(/?o, <pi, ■ • • , fk-i})- The residual is nonzero because otherwise the 
linear independence of {xq, Xi, . . . , Xk} is contradicted. The full recursive compu- 
tation is summarized in Table 1.11 

Example 1.35 (Gram-Schmidt orthogonalization) 

(i) Let xq = [l 1 Oj , Xi = [0 1 l] , and x<i = [l 1 l] . These are 
linearly independent, and following the steps in Table fLTI first yields <po = 
■j. [1 1 0] T , then vi = \ [1 1 0] T , then x x - v x = \ [-1 1 2] T , 

and if X = —7= [—1 1 2] . For the final basis vector, V2 = 3 [2 4 2] , 

X2-v 2 = ^[l -1 l] ,andv? 2 = ^j[l -1 l] • The set {<£ , <pi, <Pi} 
is the orthonormal basis from Examples ri.30| and |1.331 Since span({<^o, fi}) 
= span({xo, x±}) and the latter span is plainly the range of matrix B in 
Example 1.201 we can retrospectively see that the projection operators in 
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Gram— Schmidt Orthogonalization 

Input: An ordered sequence of linearly independent vectors {x k } k€ /c C H 

Output: Orthonormal vectors {<Pk}k£lc C H, with span({(p k }) = span({z^}) 

{<Pk} — GramSchmidt({cE;.}) 

</>o = zo/ll^oll 
k = 1 

while ip k exists do 
k-l 
project^ = y^(s fc , tpi)ipj 

■i=o 
normalize <p k = (x k - v k )/\\x k - v k \\ 

increment k 
end while 
return {ip k } 



Table 1.1: Gram-Schmidt orthogonalization algorithm. 



Examples 11.201 and 1.331 project to the same subspace. (One projection 
operator is orthogonal and the other is oblique.) 

(ii) Starting with xq = [l 1 0] and x\ = [0 1 
W> = ^[1 1 0] T and^ = ^[-l 1 2] T 
too small to be a basis for C 3 . Instead, it is an orthonormal basis for the 2- 
dimensional space span({ico, a^i})- As discussed in Examples 1 1.201 and 1 1.331 
this space is the set of 3-tuples with middle component equal to the sum of 
the first and last. 



ll " would again yield 
Now {v?0j ¥>i} is obviously 



(iii) Starting with xq = [l 1 
the same set of vectors as in 



l] , Xl 



[l 1 0] , and x 2 = [0 1 l] ' 



(i) , but in a different order, yields 



: r() 



1 

71 



<pi 



i 
7e 



and V2 = — 



There is no obvious relationship between this orthonormal basis and the 
one found in part (i) . 

Solved Exercise 1.5 applies Gram-Schmidt orthogonalization to derive normalized 
Legendre polynomials, which are polynomials orthogonal in C 2 {[— 1, 1]). 



1.5.3 Biorthogonal Pairs of Bases 

Orthonormal bases have many advantages over other bases, including simple ex- 
pressions for expansion in ( 11.85) and orthogonal projection in ( jl.971 ). While there 
are no general disadvantages caused directly by orthonormality, in some settings 
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nonorthonormal bases have their advantages, too. For example, of the bases 



1 









1 


1 


5 


1 


1 


1 







1 




1 



and 



1 


T 


1 


"-1" 


1 


l" 


71 


1 
0_ 


' V6 


1 

2 


'7a 


—l 
i 



from Example 1 1 .35|( i) I the nonorthogonal basis is easier to store and compute with. 
Solved Exercise 1.5 provides a more dramatic example, since the set of functions 
{1, t, t 2 , . . . , t N } is certainly simpler for many purposes than the Legendre poly- 
nomials up to degree N. 

A basis does not have to be orthonormal to provide unique expansions. The 
sacrifice we must make is that we cannot ask for a single set of vectors to serve the 
analysis role in x t— > a = {(x, <p>k)}k<£K all d the synthesis role in a t— > X^fceK; a k L Pk- 
This leads us to the concept of a biorthogonal pair of bases, or dual bases. 



Definition 

{fk}keK C 

are called a 


1.42 (Biorthogonal pair 

H and 5 = {fi k }keK. C H, 
biorthogonal pair of bases foi 


OF bases) The sets of vectors $ = 
where K, is finite or countably infinite, 
• Hilbert space H when 


(i) each is 


a basis for H ; and 








(ii) they are biorthogonal, meaning 










('Pi, fk) = 0~i-k 


for every i, k 


efc. 


(1.102) 



Since the inner product has Hermitian symmetry and 5i-k is real, the roles of the 
sets <!> and $ can be reversed with no change in whether (1.102J ) holds. We will 
generally maintain a convention of using the basis $ in synthesis and the basis $ 
in analysis, with the understanding that the bases can be swapped in any of the 
results that follow. To each basis we associate synthesis and analysis operators 
defined through ( 11.81) and (1.82) ; a biorthogonal pair of bases thus yields four 
operators: $, $*, $, and $*. 

Example 1.36 (Finite-dimensional biorthogonal pair of bases) The sets 



</>o 



"l" 




"0" 




"1" 


1 




, tfil = 


1 
1 


, <^2 = 


1 
1 



and ipo 



0" 




"-1" 




1 


1 
-1 


> Vl = 


1 




> ^2 = 


-1 
1 



are a biorthogonal pair of bases for C 3 , as can be verified by direct computation. 
Example 1.37 (Biorthogonal pair of bases of cosine functions) Define 
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-+t 



(a) ip {t) and ^o(t). 



(d) ip 3 {t) and Va(t). 














\ I \ t 















(b) Vi(i) and Vi(t). 




(e) V^M an d ipiit)- 



(c) V 2 (£) and V 2 (t). 













\ I \ I \ t 


M \ 1 \ H 


V I \ J K i 











(f) VsWandVsW- 



Figure 1.22: Elements of the biorthogonal pair of bases $ (solid lines) and $ (dashed 
lines) in Example 1.371 



* = {<MfcGN C £ 2 ([-i, i]) and * = {VfclfeGN C £ 2 ([-i, i]) by 

^o(t) = 1 

= fo{t), 

t{j k (t) = \/2cos(27rfci) + iv / 2cos(27r(fc + l)i) 

= <Pk{t) + ^(pk+i(t), k = 1, 2, . . . , 

&(*) = 1 

= <Po(t), 



<p k (t) = ^(-i) fc - m v/2cos(27rmi) 

111 = 1 

k 

= £(-5)*~"Wt), fc eN, 



(1.103a) 
(1.103b) 
(1.103c) 

(1.103d) 



where {<Pk}kez are the orthonormal basis functions from ( jl.21[ ). The first few 
functions in each of these sets are shown in Figure 11.221 Verifying that ( 11.102) 
holds is only part of proving that the sets ^ and \& form a biorthogonal pair of 
bases; this is left for Exercise 11.431 We must also verify that each set is a basis 
for the same subspace of C 2 ([— h ,h])- 

By construction, {(fk}keti forms an orthonormal basis for the closure of 
its span S. We will see that the sets ^ and ^ are also bases for S. The set 
\& is linearly independent because no function ipi can be written as a linear 
combination of {"0fc}fc=o' ^hi s f° uows from ?/>, containing a higher-frequency cosine 
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than any lower- numbered element from the set. The set ^ is similarly linearly 
independent. The closure of the span of *f and S are equal: span^) C S 
because each ipk is a linear combination of one or two (pk% and S C span ('I') 
because tpo = ipo an d for each k £ Z + , ipf. can be written as an infinite linear 
combination of ipkS] a detailed argument is left for Exercise 11.431 Similarly, the 
closure of the span of the set \P is equal to S. 

Expansion and Inner Product Computation With a biorthogonal pair of bases, 
expansion coefficients with respect to one basis are computed using the other basis. 



Theorem 1.43 (Biorthogonal basis expansions) Let <f> = {<Pk}keic and 
$ = {<Pk}keK be a biorthogonal pair of bases for Hilbert space H. The unique 
expansion with respect to the basis $ of any x in H has expansion coefficients 


a k = (x, (ft) for k £ K, or, 




(1.104a) 


a = $* x. 




(1.104b) 


Synthesis with these coefficients yields 






fce/c 




(1.105a) 


= $a = *&$* x. 




(1.105b) 



Proof. The proof parallels the proof of Theorem 1.38 with minor modifications based 
on replacing orthonormality condition ( 11.83] ) with biorthogonality condition (]1.102[) . 

The existence of a unique linear combination of the form ( | 1.791) is guaranteed by 
the set $ being a basis. The validity of ( ]1.105a|) with coefficients ( ]1.104a|) follows from 
the following computation: 

(x,<Pk) — (y]ai<pi, <pk) = /^(ifi, ftk) = /, ogSj-k — Uk, 

ieK i€K i€K 

where (a) follows from ( 11.79) : (b) from the linearity in the first argument of the inner 
product; (c) from the biorthogonality of the sets $ and $, ( ] 1.1021) ; and (d) from the 
definition of the Kronecker delta sequence, ( 11.91) . 

The expressions ( |1.104b|) and ( |1.105b|) are equivalent to (|1.104a|) and ( |1.105a|) 
using the operators defined in ( 11.811) and ( 11.82 j) . 

Reversing the roles of the bases $ and $ gives expansion coefficients with respect 
to the basis <f>: 

5fc = (x, ifk) for fc e AC, or, a = <f>* x, (1.106) 

with the corresponding expansion 

x = J2{x, p k W k = $$*x. (1.107) 

fce/c 
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The theorem shows that a biorthogonal pair of bases can together do the job of 
an orthonormal basis in terms of signal expansion. The most interesting properties 
of the synthesis and analysis operators involve both bases of the pair. Since (1.105b) 
holds for all x in H , 

$$* = / on H. (1.108) 

This leads to an analogue of Theorem 11.391 



Theorem 1.44 (Parseval's equalities for biorthogonal pairs 


OF BASES) 


Let $ = {<fik}keK and $ = {<Pk}keK be a biorthogonal pair 
space H. Expansion with respect to the bases $ and $ witl 


of bases i 
1 coefnciei 


"or Hilbert 


its (1.104) 


and (11.1061) satisfies 


IMI 2 = X^ x ' w>)( x > ^ fc )* 




(1.109a) 


keK. 






= ($*cc, $* x) = (5, a). 




(1.109b) 


More generally, 






(z> V) = ^( x > Vk){y, fik)* 




(1.110a) 


k£K 






= (**x,$*y) = (a,0). 




(1.110b) 



Proof. We will prove (1.110b) ; ( 1 1.1 10a) is the same fact expanded with the definitions 
of <fr* and $*, and equalities (11.1091) follow by setting x = y. For any x and y in H , 

($*X, $>* y) = {x,<£>$*y} = (x,y), 



where (a) follows from the definition of adjoint; and (b) from (11.1081) . 



Gram Matrix Theorem 1 1.441 is not nearly as useful as Theorem 1 1 .391 because it in- 
volves expansions with respect to both bases of the pair. In (1.110) , x = J2k^K ®Wk 
and y = J2k£ic PWh (note the use of different bases) so that 



(x, y) = (a, 0) = 2^ a kPt 



E 

fce/c 



More often, one wants all expansions to be with respect to one basis of the pair; the 
other basis of the pair serves as a helper in computing the expansion coefficients. If 
x = <&a and y = <I>/3 (both expansions with respect to the basis $), then 

(x,y) = (*o, $/3) ( =' (***o,/3> ( =' (Got,®, (1.111) 
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where (a) follows from the meaning of adjoint; and (b) from introducing the Gram 
matrix or Gramian G, 



G = $*$, 
Gik = (tpk, <Pi) for every i, k e /C, 



G 



{<P-i, <P-i) fro, f-i) (<Pu <P-i) 
(P-i, <Po) (<Po, ¥>o) (</?i, ¥>o) 
(V-i, <Pi) (<Po, <Pi) (<pi, <Pi) 



(1.112a) 
(1.112b) 



(1.112c) 



The order of factors in ( 11. Ill) evokes a product of three matrices: 

(x,y) = p*Ga, 



(1.113) 



where as before a and (1 are column vectors. When the set $ is an orthonormal 
basis, G simplifies to the identity operator on £ 2 (K.) and ( 11.113) simplifies to (1.90) . 

Example 1.38 (C 3 inner product computation with bases) Consider the 
basis {vo; Vii ^2} C C 3 from Example 11.361 The Gram matrix of this basis is 



G 



$*$ 



"1 1 0" 




"1 1" 




"2 1 2" 


1 1 




1 1 1 


= 


1 2 2 


1 1 1 




1 1 




2 2 3 



For any x and y in C 3 , the expansions with respect to the basis $ are a = $>*x 
and (3 = <&*y, where 

"0 1-1" 

$* = -1 1 

1 -1 1 

is the analysis operator associated with the basis {<fo, <fii, ^2} C C 3 from Exam- 
ple \LM Then (x, y) = P* G a by using ( 1LTT31 ). 

In C , it is often natural and easy to use the standard basis for inner product 
computations. Thus, the previous example may seem to be a complicated way to 
achieve a simple result. In fact, we have 

(3* Ga = ($*j/)*($*$)($*a;) = y*$$*$$*a; = y* ($$*)*($ $*)x, 

so in light of (1.108) , the inner product y* x has been altered only by the insertion of 
identity operators. The use of (1.113) is more valuable when expansion with respect 
to a biorthogonal basis is natural and precomputation of G avoids a laborious inner 
product computation, as we illustrate next. 

Example 1.39 (Polynomial inner product computation with bases) 
Consider the polynomials of degree at most 3 under the £ 2 ([— 1, 1]) inner prod- 
uct. The basis {1, t, i 2 , t 3 } is easy to use because the expansion coefficients 
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of a polynomial x(t) = ay + a\t + o^i 2 + a^t 3 are read off directly as a = 
[arj ol\ 0-2 0:3] • However, since the basis is not orthonormal, we cannot 
compute inner products with (1.90) . Instead, since 



(t\ t l ) 



t k f dt 



1 



-{i-{-iy +k+l ) 



the Gram matrix of the basis is 



G 



2 2/3 0' 

2/3 2/5 

2/3 2/5 

2/5 2/7 



and inner products can be computed without any integration using ( 11-1131 ) 



Inverse Synthesis and Analysis Equation ( | 1.1081 ) shows that $* is a right inverse 
of $. It is also true that 



$*$ 



on £ 2 {IC), 



(1.114) 



making $* a left inverse of $ and furthermore showing that $* is the unique inverse 
of $. To verify (1.114) , make the following computation for any sequence a in £ 2 (JC) : 



$* $a 



( Q ) 3C 



®*^2aitpi = {(T /ie K a ^i^k}} kelc = {T,i&cOn(ipi,!p k }} k 



i£K 



(d) 



{EiGJC«i*i-*-f/, a; 



-}, 



(e) 



{«fe} 



feeK 



a, 



keK 



(1.115) 



where (a) follows from (1.81) ; (b) from ( 11.82) ; (c) from the linearity in the first 
argument of the inner product; (d) from biorthogonality of the sets {tfk}keK and 
{(f>k}k<EK, (1.102) ; and (e) from the definition of the Kronecker delta sequence, (1.9) . 
Knowing that operators associated with a biorthogonal pair of bases satisfy 



$* =$- 



(1.116) 



can be used to determine $ from $ such that the sets $ and $ form a biorthogonal 
pair of bases. A simple special case is when Hilbert space H is C N (or M. N ). Then, 
synthesis operator $ is an N x N matrix with the basis vectors as columns, and 
linear independence of the basis implies that $ is invertible. Setting $ = (<1> _1 )* 
means that the vectors of the dual basis are the conjugate transposes of the rows 
of <I> -1 . It is a valuable exercise to check that in Example 11.361 $ can be seen as 
derived from <I> in this manner. 

Example 1.40 (Dual bases, Example 11.281 cont'd) Take the basis $ from 
Example 1.28|(ii)[ Assuming 8 7^ 0, and using (1.116) , the basis $ and its dual 
basis $ are 



<!> 



1 cos 9 
sin6» 



<!> 



1 



cot( 





csc( 



(1.117) 
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^min i A 



mm ; /v max 




(c) 



(d) 



Figure 1.23: A biorthogonal pair of bases in R 2 and their Riesz basis stability constants, 
(a) Basis $ from Figure [1.18( b). (b) Its corresponding dual basis $. (c) A m in and A ma x 
for the basis in (a) as a function of 9. (d) A m i ri and A max for the dual basis in (b) as a 
function of 9. The Riesz basis constants here are the reciprocals of the ones in (c). 



both shown in Figure [1.23( a) and (c). We can easily check that the biorthogo- 
nality condition ( ] 1.1021 ) holds. The figure also illustrates how a unit-norm basis 
does not necessarily lead to a unit-norm dual basis. 

We have already computed A nun and A max for the basis in (a); we can 
similarly find that for the dual basis, 



A„ 



1 



K 



Xn 



1 



A, 



(1.118) 



shown in Figure 1.23( d). Clearly, the pair is best behaved for 8 = n/2, when 
it reduces to an orthonormal basis. As 6 approaches 0, the basis vectors in $ 
become close to colinear, destroying the basis property. 



When the Hilbert space is not C N (or M. N ), the simplicity of the equation 
$* = (f^ 1 is deceptive. The operators $* and $ _1 are mappings from H to £ 2 (/C). 
The analysis operator $* maps from H to £ 2 (IC) through the inner products with 
{ifk}keK- To determine {<f>k}ke>C from $ _1 is to interpret the operation of $ _1 as 
computing inner products with some set of vectors. We derive that set of vectors 
next. 
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Dual Basis So far we have derived properties of a biorthogonal pair of bases. Now 
we show how to find the unique basis $ that completes a biorthogonal pair with 
a given Riesz basis $. As noted above, when $ is a basis 19 ! for C , finding the 
appropriate $ is as simple as inverting the matrix $. In general, we find <fr by 
imposing two key properties: $ and $ span the same space H, and the sets are 
biorthogonal. 

Let $ = {(pk}keic C H be a Riesz basis for Hilbert space H. To ensure 
span($) C span($), let 

ifk = 2_^, a e.k<fie, for each k G K,. (1.119a) 

eeic 

This set of equations can be combined into a single matrix product equation to 
express the synthesis operator $ as 

$ = $A, (1.119b) 

where the (£, k) entry of A : £ 2 (K.) —* £ 2 (K.) is az t k- Determining the coefficients 
ag,k for k, £ G K, specifies the dual basis $ through either of the forms of ( 11.119) . 
The biorthogonality condition ( 11.102} ) dictates that for every i, k G )C, 

5i~k = (Vi, <Pk) = (»>», y^,ai,k<Pi) = J^atkiVu <Pe) = ^ a *i,k G t,i, (1.120) 

eeic eeic teK 



where (a) uses ( 11. 119a) to substitute for ^; (b) follows from conjugate linearity in 
the second argument of the inner product; and (c) uses the Gram matrix defined in 
(1.112b) . By taking the conjugate of both sides of ( jl.120) and using the Hermitian 
symmetry of the Gram matrix, we obtain 

Si-k = ^ Gj t iai t k, for every i, k G K. (1.121a) 

eeic 

This set of equations can be combined into a single matrix product equation 

I = GA. (1.121b) 

Thus the inverse of the Gram matrix gives the desired coefficients] 20 ] 
This derivation is summarized by the following theorem: 

Theorem 1.45 (Dual basis) Let $ = {ipk}k£K be a Riesz basis for Hilbert 
space H , and let A : £ 2 {K) — > £ 2 {1C) be the inverse of the Gram matrix of $, that 
is, A = ($*$) _1 . Then the set $ = {ifik}keic defined via 

<f>k = 2_j a e.k<Pe> f° r each fc G fC, (1.122a) 

eeic 



19 Recall that in any finite-dimensional spaces, any basis is a Riesz basis. 
20 The Riesz basis condition on <E> ensures that the inverse exists. 
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,Vo(i) 



,IPi(t) 



,&(*) 




(b) Dual basis $. 



Figure 1.24: (a) Three functions <po, (pi, and </?2 related by circular shift form a basis 
$. (b) The dual set 4> = {<p , <pi, &} is derived in Example 11.411 



together with $ forms a 


3iorthog 


onal 


pair 


of bases for H. 


The 


synthesis 


operator 


for this basis is given by 


















$ = 


$A 


= 


$($*$)-!. 






(1.122b) 



Recall from ( jl.1161 ) that the synthesis operator associated with the dual basis 
could be written as $ = (<1> -1 )*. However, the inverse and adjoint in this expression 
are difficult to interpret. In contrast, the key virtue of ( 1 1 . 1 2 2 [ ) is that the inversion 
is of the Gram matrix, which is easier to interpret because it is an operator from 
£ 2 (IC) to £ 2 (/C). This is illustrated in the following finite-dimensional example. 

Example 1.41 (Dual of basis of periodic hat functions) Consider the func- 
tion 

f t, forte [0,1]; 

<p (t) = < 2-t, forte (1,2]; (1.123) 

{ 0, for t e (2, 3] 

in C ([0,3]) and its circular shifts by 1, as shown in Figure [1.241 The set $ = 
{(fio, tpi, <fi2} is a basis for a subspace S = span(<J>) C £ 2 ([0,3]). This subspace 
is the set of functions x that satisfy sc(0) = x(3) and are piecewise linear on [0, 3] 
with breakpoints at 1 and 2. 

We wish to find the basis $ = {y?rjj ¥>i, ^2} that forms a biorthogonal pair 
with <J>. The Gram matrix of <!> is 



G 



2/3 1/6 1/6' 
1/6 2/3 1/6 
1/6 1/6 2/3 



(1.124) 
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Using its inverse in fll.l22aj ) yields 

^0 = §¥>o - \vi ~ 1^2, 
$1 = -§<£o + §¥»i - \f2, 

$2 = -\V0- \<Pl + f</>2- 



These functions are depicted in Figure 11.241 Since each iff. is a linear com- 
bination of ifkS, it is clear that span($) C span($). One can also show that 
span($) C span($). For an intuitive understanding, note that each <pk satis- 
fies ^fc(O) = £>fc(3) and is piecewise linear on [0,3] with breakpoints at 1 and 2; 
thus, the sets span the same subspace. The solution is unique, whereas many 
sets of functions satisfy the biorthogonality condition ( 11.1021 ) without satisfying 
span($) = span($). 

The dual of the dual of a basis is the original basis, and a basis is its own dual if 
and only if it is an orthonormal basis. Also, if the Riesz basis constants of the basis 
$ are A m i„ and A max , then $ is a Riesz basis with constants 1/A max and 1/A m i n . 
Establishing these facts formally is left for Exercise 11.441 As we mentioned earlier, 
it can be advantageous for numerical computations to have A m i n w A max . The 
property of A m i n /A max « 1 is then maintained by the dual. 

Dual Coefficients As we noted earlier in reference to computation of inner prod- 
ucts, it is often convenient to use only one basis explicitly. Then, we face the 
problem of finding expansions with respect to the basis $ from analysis with $*. 
Unless $* = $*, in which case $ is an orthonormal basis, the coefficients obtained 
with analysis by $* must be adjusted to be the right ones to use in synthesis by $. 
The adjustment of coefficients is analogous to computation of the dual basis. 

To have an expansion with respect to the basis <£ from analysis with $*, we 
seek an operator A : l 2 (fC) — > £ 2 (IC) such that 

x = $A$*a; for every x G H. 

It is easy to verify that A = ($*<i>)~ 1 , the inverse of the Gram matrix, is the desired 
operator. In terms of the coefficient sequences defined in ( Il,104b[ ) and ( J1.106J ), A 
maps a to a. Naturally, the Gram matrix maps a to 5. 

Oblique Projection Similarly to the truncation of an orthonormal expansion giv- 
ing an orthogonal projection (Theorem 1 1.40J ), truncation of ( ]1.105aj ) or ( |1.106| ) gives 
an oblique projection. The proof of the following result is left to Exercise 1.451 



Theorem 1.46 Given sets <&% = Wk}i£i C H and $1 = {(fik}iei C H satisfying 
{tfi, <Pk) = Si-k for every i, k el, 
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Figure 1.25: Example of an oblique projection. The projection is onto Sx, the sub- 
space spanned by (pa. The projection is orthogonal to Sx, the subspace spanned by the 
biorthogonal vector ipo- 



for any x in H, 










fcGX 




(1.125a) 




= $l$jl 




(1.125b) 


is a projection of x onto St = 
where Sx = span({^ fc } fce x)- 


= span({</3fe}fc G x)- The residual 


satisfies x- 


-PixISt, 



Example 1.42 (Oblique projection, Example 11.401 cont'd) We continue 
our discussion of Example 11.401 and illustrate oblique projection. Define P% via 

P x x = (x, (po)tpo = ^l^jX, 



0212) as 



with 



$x 



[1 0] T , $x = [1 -cot6?] T . 



Figure \1 .251 illustrates the projection (not orthogonal anymore), the subspace Sx, 
the residual x — Pjx, and the subspace Sr- 

While the above theorem gives an important property, it is not as useful as Theo- 
rem 11.401 because oblique projections do not solve best approximation problems. 

Decomposition By applying Theorem 1.461 with any single-element set 2", we 
see that any one term of ( 11.105a) or (1.106) is an oblique projection onto a 1- 
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dimensional subspace. Thus, a biorthogonal pair of bases induces a pair of decom- 
positions 

H = 0%} and H = S {k} 

fce/c fce/C 

where £{&} = span((^fc) and S{fc} = span(^fc). Actually, because of linear indepen- 
dence and completeness, any basis gives a decomposition of the form above. A key 
merit of a decomposition that comes from a biorthogonal pair of bases is that the 
expansion coefficients are determined simply as in (1.104) . 

Best Approximation and the Normal Equations According to the projection the- 
orem (Theorem 11.26] ), given a closed subspace S of a Hilbert space H , the best 
approximation of a vector x in H is given by the orthogonal projection of x onto S. 
In Theorem 1.401 we saw how to compute this orthogonal projection in one special 
case. We now derive a general methodology using bases. 

Denote the orthogonal projection of x onto S by x. According to the projection 
theorem, x is uniquely determined by x € S and x — x J- S. Given a basis {(pk}kei 
for S, the projection being in S is ensured by 

x = J2^k (1.126a) 

feel 

for some coefficient sequence (3, and the residual being orthogonal to S is expressed 
as 

(x — x, ipi) = for every i s X. (1.126b) 



Substituting ( |1.126a| ) into ( ] 1.126b] ) gives 

(x, (pi) = (X)jter PhVk, <Pi) = ^>2f3k(<Pk, ¥1) for every i e I. 

fcex 

Solving these equations gives the following result: 



Theorem 1.47 (Normal equations) Given a 


vector x 


and 


a linearly indepen- 


dent set 


{<Pk}kex in a separable Hilbert space 


: H, the 


vector closest to x in 


span({<p A 


■}kei) is 


feel 
= */3, 






(1.127a) 
(1.127b) 


where (3 


is the unique solution 


to the system of equations 








^2f3 k ((p k , <pi) 


= (x, (pi) for 


every i G 


I, 


or, (1.128a) 




fcex 












$*$/? 


= 3>*x. 






(1.128b) 
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Equations (1.128) are called normal equations because they express the normal- 
ity (orthogonality) of the residual and the subspace (from (1.126b) ). In operator 
notation, combining (1.127b) and fll.l28b] ) leads to 



$($*$)~ 1 $* 2: = p x . 



(1.129) 



It is then easy to check that P is an orthogonal projection operator (Exercise 1 1.46) . 
Invertibility of the Gram matrix $*$ follows from the linear independence of 
{fk}k£i (Exercise 1 1.46) . If the set {fk}kei is not linearly independent, the projec- 
tion theorem still ensures that x is unique, but naturally its expansion with respect 
to {ty?fc}fcex i s n °t unique. We illustrate these concepts with an example. 

Example 1.43 (Normal equations in M 3 ) Let ^o = [1, 1, Opand^i = [0, 1, l] 1 
Given a vector x = [l 1 ll , according to Theorem 1.471 the vector in 
span({(y9o, <Pi}) closest to x is 

with (3 the unique solution to (1.128b) , which simplifies to 



"2 1" 
1 2 




A" 
A. 


= 


"2" 

2 



Solving the system yields po = Pi = 2/3, leading to 

2 



■X<Po + <Pi) 



2/3 
4/3 
2/3 



We can easily check that the residual x — x is orthogonal to span({(/?o, fi})' 



x — x 



2/3' 

-2/3 

2/3 



a tp +ai<pi 



Qo 



Now let if 2 



where P satisfies 



[1, 0, — 1] T . The vector in span({</?o, fi, ^2}) closest to x is 
x = P if + Piipi + P 2 lf2 



2 


1 


1 




Po 




2 


1 


2 


-1 




Pi 


= 


2 


1 


-1 


2 




Pi 








The solutions for P are not unique, but all solutions yield x = [2/3, 4/3, 2/3] T 
as before. This is as expected since span({(^o, fi, ^2}) = span({(/3o, ^l})- 

In the special case that {<fk}kei is an orthonormal set, the normal equations (1.128) 
simplify greatly: 



(x, (pi) = y^Pkivk, <fi) = y^Pkh-i 

fcex fcex 



(c) 



Pi, 



for every j£l, 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



94 Chapter 1. From Euclid to Hilbert 

where (a) is ( J1.128J ); (b) follows from orthonormality; and (c) from the definition 
of the Kronecker delta sequence, fll.9j) . Thus the coefScients of the expansion of x 
with respect to {ipk}kei come from analysis with the same set of vectors, exactly 
as in Theorem 11.401 

Solving the normal equations is also simplified by having available {<fk}kex 
that together with {(pk}kei forms a biorthogonal pair of bases for the subspace of 
interest. Then by combination of ( ]1.122b| ) and ( 11.1291 ), the best approximation is 
x = $$*£. 

In general, {<Pk}kex is not an orthonormal set, and we may want to express 
both x and x through expansions with a different basis {tpk}keK that is not neces- 
sarily orthonormal: 

x = J^ a k 4>k and x = J^ a k ipk- 
fce/c feG/c 

Properties of the mapping from a to a are established in Exercise 1 1.471 In particular, 
orthogonal projection in H corresponds to orthogonal projection in the coefficient 
space if and only if expansions are with respect to an orthonormal basis. 

Successive Approximation Continuing our discussion of best approximation, now 
consider the computation of a sequence of best approximations in subspaces of 
increasing dimension. Let {yi}igN be a linearly independent set, and for each k £ N, 
let Sk = span({y>rji <Pu ■ ■ ■ > fk-i})- Let x 1 -^ denote the best approximation of x in 
SkCJ In Section [1.5.21 we found a simple recursive computation for the expansion 
of x^ ' with respect to {(fi}i£N for the case that {<^i}igN is an orthonormal set; 
see ( 11.991 ). Here, recursive computation is made more complicated by the lack of 
orthonormality of the basis. 

The spaces Sk and Sk+i are nested with Sk C Sk+i, so approximating x with 
a vector from Sk+i instead of one from Sk cannot make the approximation quality 
worse; improvement is obtained by capturing the component of x that could not be 
captured before. The nesting of subspaces can be expressed as 

Sk+i = S k (BT k , (1.130) 

where the one-dimensional subspace Tk is not uniquely specified. If we choose Tk to 
make this direct sum an orthogonal decomposition, then the increment x^ k+1 ' — x^ > 
will simply be the orthogonal projection of x onto Tk- The decomposition ( j 1.1301 ) 
is orthogonal when Tk = span(ipk) with ipk -L Sk, so we get the desired direct sum 
by choosing ipk parallel to the residual in orthogonally projecting ipk to Sk- This 
approach simplifies the computation of the increment at the cost of requiring ipk- 
It can yield a computation savings when the entire sequence of approximations is 
desired and the ipkS are computed recursively through Gram-Schmidt orthogonal- 
ization. 

Let x~(°> = 0, and for k = 0, 1, . . ., make the following computations. First, 
compute ipk orthogonal to Sk and, for convenience in other computations, of unit 



L By these definitions, So = {0} and x 



(0) 
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fc-i 

Vk = ^<<y5fe, ipi)ipi, (1.131a) 

i=0 

*k = /^r. (1.131b) 

Wk - Vk\\ 

In this computation, Vk is the orthogonal projection of <fk onto Sk since {4>i}iZo is 
an orthonormal basis for Ski see ( 11.97a) . With this intermediate orthogonalization, 
we have 

j(fc+i) = j(fc) + fa ^ fe . (1.131c) 



Exercise 11.481 explores the connection between this algorithm and the normal equa- 
tions. 

1.5.4 Frames 

Bases are sets of vectors that are both linearly independent and complete (see Defi- 
nition [L33) ■ Completeness ensures the existence of expansions; linear independence 
ensures that the expansions are unique. In a finite-dimensional space, linear inde- 
pendence upper bounds the number of vectors by the dimension of the space while 
completeness lower bounds the number of vectors by the dimension of the space; 
thus, there are exactly as many vectors as the dimension of the space. Frames 
are more general than bases because they are complete but not necessarily linearly 
independent. In a finite-dimensional space, a frame must have at least as many 
vectors as the dimension of the space. In infinite-dimensional spaces, a frame must 
have infinitely many vectors, and imposing something analogous to the Riesz basis 
condition ( 11.80) prevents certain pathologies. 

Why would we want more than the minimum number of vectors for complete- 
ness? There are several possible disadvantages: When linear independence is lost, 
uniqueness of expansions is lost with it, and it would seem at first glance that having 
a larger set of vectors implies more computations in both analysis and synthesis. 
The primary advantages come from flexibility in design: Fixing analysis leaves flex- 
ibility in synthesis and vice versa; and we will see in Chapters \W\ [II] and \V2\ that 
frames can have additional desirable properties unavailable with bases. 



Definition 1.48 (Frame) The set of vectors $ = {ipk}kej C H, where J is 
finite or countably infinite, is called a frame for Hilbert space H when the largest 
A m in and smallest A max such that 

Amin IMI 2 < JJ \(x, <p k )\ 2 < A max ||a;|| 2 , for every x in H, (1.132) 

kej 

satisfy < A m j n < A max < 00. The constants A m i n and A max are called frame 
bounds. 
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(a) <po(t) and fo(t). 



(b) y>i(t) and (p+(t). 



(c) ipa(t) and tp£(t). 



Figure 1.26: Example frame functions from «3> U $ + (black for functions from $ and red 
for functions from $ + ). 



A frame is sometimes called a Riesz sequence. This highlights the similarity of 
( I1.132J ) to condition ( 11.80J ) in Definition 1.341 for Riesz bases and also that a frame 
is not necessarily a basis. 

Let us immediately compare and contrast Definitions 1.341 and 1.48[ 



(i) 
(ii) 



The notation for the index set has been changed from K, to J to reflect that 
these are not generally the same size when H has finite dimension. 



(iii) 



The definition of a frame in Definition 1.481 uses the set <& in analysis; in con- 
trast, the definition of a basis in Definition 1.341 uses the set $ in synthesis. 
Nevertheless, both bases and frames can be used for both analysis and syn- 
thesis. A frame generally lacks linearly independence, so an expansion of the 
form x = X^feg/7" a kfk is not necessarily unique; this prevents a closer parallel 
in the definitions. 

If <J> and $ form a biorthogonal pair of bases, then the unique expansion with 
respect to <J> is obtained through analysis with $; see ( 11.1071 ) . In this case, 
comparison of Definitions 1.341 and 1.481 shows that $ being a Riesz basis with 
constants A m i n and A max implies that $ is a frame with frame bounds A mm 
and A max . Since the dual of a Riesz basis is a Riesz basis, we reach the simple 
conclusion that any Riesz basis is a frame. 



Uses of frames in analysis and synthesis will be established shortly. Exercise 1.49 
explores the differences between Definitions 11.341 and 1.481 further. 

Example 1.44 (Frame of cosine functions) Starting with $ = {</?fc}fc e N C 



C 2 {\-\, |]) from ( TL21) , define the set of functions ^ 
each tfk by v2cos(7rf): 



Wk HeN b y multiplying 



<p+{t) = s/2cos(nt) (p k (t), fceN. 

A few functions from $ U < 1 )+ are shown in Figure 1.261 

We know already from Example 1.31 that $ is an orthonormal basis for the 
closure of its span, S = span($). The union $ U $ + is a frame for S. To see that 
the closure of the span of $ U $ + is not larger than S, note that each (p~£ can be 
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written as a linear combination of elements of $. For k 6 Z + , 

(fk(t) = v 2cos(7rt) (fk{t) = 2 cos(irt) cos(27rH) 
= cos(27r(fc- l)i) + cos(27r(fc + l)i) 

j (fk-i(t) + 4^tpk+i(t), for k = 1; 



1 ^-iW + 75^+i(<), forfc = 2,3, 



Being able to write ip^ as a linear combination of {ifk}k€fi becomes clear from 
the Fourier series studied in Section 13.51 Computing the frame bounds of this 
frame is left for Exercise 1.501 

Operators Associated with Frames Analogously to bases, we can define the syn- 
thesis operator associated with {(pk}kej to be 

$ : £ 2 (J) -> H, with $a = Y^ a k<Pk- (1.133) 

The second inequality of ( 1 1.132) implies that the norm of this linear operator is 
finite and the operator thus bounded. 

Similarly, we define the analysis operator associated with {<Pk}kej to be 

$* : H -» £ 2 {J), with ($*x) fe = (x, tp k ), kej. (1.134) 

The norm of the analysis operator is the same as that of the synthesis operator. 
The power of the operator notation can be seen in rephrasing ( 11.1321 ) as 



A mi „ /<$$*< A max I. 


(1.13? 


?he first equality can be derived as follows: 




(($$*- Ami,,/)*, x) ( = 5 (^*X,x)-(X mh Jx,x) ( =' ($$*X, 


Xj A m i n \X, *E/ 


( =' (&x,&x)-\ mia (x,x) = \\$*x\\ 2 - 


(d) 

-A min ||a;|| 2 > 



where (a) follows from distributivity of the inner product; (b) from the linearity in 
the first argument of the inner product and the meaning of the identity operator; 
(c) from the definition of adjoint; and (d) from the first inequality of (1.1321 ). The 
second inequality of ( 11.1351 ) can be derived similarly. 

Because $$* i s a Hermitian operator, the operator analogue of ( jl.225| ) holds; 
thus, the frame bounds are the smallest and largest eigenvalues of $$*. This gives 
an easy way to find the frame bounds, as we illustrate in the following example. 

Example 1.45 (Frames in R 2 ) In 01.141 ), we defined a frame. The vectors 
are clearly not linearly independent; however, they do satisfy (1.1321 ). To com- 
pute the frame bounds, we could follow the path from Example 1.281 com- 
pute ~^2kej \( x > Vfc)! 2 an d find the frame bounds as infimum and supremum of 
^2keJ K x ' < / 9 fe)| 2 /( a; o + x i)- ^ much easier way is to use the operator notation, 
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where $ is given in (11.16aJ ). This $ is a rectangular matrix, illustrating the fact 
that a frame is an overcomplete expansion. Then $<!>* is 



$$* 



"2 


1" 


l 


"l 


-1" 




'3 0" 


i 


1 1 


1 


2 


~ V2 


1 


1 




1_ 


V2 


-1 1 



where we have performed an eigendecomposition on the Hermitian matrix $$* 
via ( jl.210aj ). We can immediately read the smallest and largest eigenvalues, 
Amin = 1 and A max = 3, as the frame bounds. 

In many ways, including expansion and inner product computation, frames play the 
same roles as bases. When a frame lacks linear independence, it cannot induce a 
subspace decomposition because of the uniqueness requirement in Definition 1.301 
The connection between frames and projections is more subtle. We now develop 
these ideas further, covering the special case of tight frames first. 

Tight Frames 



Definition 1.49 (Tight frame) The frame $ = {(fk}kej C H, where J is 
finite or countably infinite, is called a tight frame, or a A-tight frame, for Hilbert 
space H when its frame bounds are equal, A m i n = A max = A. 



For a A-tight frame, ( 11.135) simplifies to 

$$* = XI. (1.136) 

A tight frame is a counterpart of an orthonormal basis, as we will see shortly. 

Example 1.46 (Finite-dimensional tight frame) Take the following three 
vectors as a frame for K 2 : 






<Pi 



_i 

2 
2 J 



92 



_1 ■ 
2 

V3 
2 ■ 



1 
2 



1 1 
' 1 



u 2 2 



(1.137a) 
(1.137b) 



Computing its frame bounds as we did in Example 1.451 we find that 

3 



$$* 



-/, 



and thus, the eigenvalues are A m i n = A max = 3/2, and the frame is tight. Note 
that this frame is just a normalized version of the one in (1.15) , which is a 1-tight 
frame. 
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Frames that are 1-tight are called Parseval tight frames. We can normalize any 
A-tight frame by pulling 1/vA into the sum in ( | 1.1321 ) to yield a 1-tight frame, 



£ 



(x, \- 1/2 ^ k )\ = £>,&>| a = Nl 2 - ( L138 ) 

k k 

Because of this normalization, we can associate a 1-tight frame to any tight frame. 
Note that orthonormal bases are 1-tight frames with all unit-norm vectors. In 
general, the vectors in a 1-tight frame do not have unit norms or even equal norms. 

Expansion and Inner Product Computation Expansion coefficients with respect 
to a 1-tight frame can be obtained by using the same 1-tight frame for signal analysis. 



Theorem 1.50 (Tight frame expansions) Let $ 


_ 


Wk]keJ 


be 


a 1-tight 


frame for Hilbert space H . Analysis of any x in H gives 


expansion 


coefficients in 


i 2 {J) 










a k = (x, ip k ) for k e J, 


or 


; 




(1.139a) 


a = $*x. 








(1.139b) 


Synthesis with these coefficients yields 










x = y^(x, fk)fk 








(1.140a) 


kej 










= $a = $$*x. 








(1.140b) 



Note the apparent similarity of this theorem to Theorem 1.381 The equations in 
these theorems are identical, and each theorem shows that analysis and synthesis 
with the same set of vectors yields an identity on H . In the orthonormal basis case, 
the expansion is unique] in the 1-tight frame case it is generally not. The a given 
by ( ] 1.139b) can be replaced by any a' = a + a 1 - , where or" is in the null space of 
$, while maintaining x = $o'. 

The theorem follows from two simple facts: a G f- 2 {J) because $* is a bounded 
operator; and 

$$* = I on H (1-141) 

by setting A = 1 in ( 11.136) . This leads to Parseval's equalities for tight frames: 

Theorem 1.51 (Parseval's equalities for 1-tight frames) Let 

$ = {(fk}k<£j be a 1-tight frame for Hilbert space H . Expansion with coefficients 
( 1TT391 ) satisfies 

|M| 2 = £|<*,^}| 2 (L142a) 

kej 
= U***!! 2 = ||o|| 2 . (1.142b) 
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More generally, 










(%, y) 




(1.143a) 






= (®*x,$*y) = (a, /J). 


(1.143b) 



Again, this theorem looks formally the same as Theorem 1.391 Beyond what we 
have already mentioned ($ is a linearly-dependent set of vectors and the expansion 
is nonunique), this theorem hides even more. For example, the norm-preservation 
property could be misleading; the frame in the theorem is 1-tight, and thus, even if 
its elements have all the same norm, that norm is generally not 1. 

Example 1.47 (Parseval's equalities for tight frames) Let us continue 
with the frame from ( 11.15) . Its vectors are all of norm 2/3. Normalizing it so 
that all of its vectors are of unit norm yields the frame in ( 11.137) . Computing 
the norm of the expansion coefficients ||a|| 2 for this frame yields 



I** ill 










2 r 


2 

V3 
2 




X 


= 



x 

2 

Xg + ^/3'X! 

2 



511*11 



This tells us that for this tight frame with all unit-norm vectors, the norm of the 
expansion coefficients is 3/2 times larger than that of the vector itself. This is 
intuitive as we have 3/2 times more vectors than needed for an expansion in M 2 . 

This example generalizes to all finite-dimensional tight frames with unit-norm vec- 
tors. For such frames, the factor appearing in the Parseval's equality denotes the 
redundancy of the frame. 

Inverse Synthesis and Analysis For a 1-tight frame, ( 11.141) shows that the syn- 
thesis operator is a left inverse of the analysis operator. Unlike with an orthonormal 
basis, the synthesis operator associated with a 1-tight frame is generally not a right 
inverse (hence, not an inverse) of the analysis operator because $*$ ^ /. In finite 
dimensions, this can be seen easily from the rank of $*$; the rank of $*$ is the 
dimension of H, but $*$ is an operator on C' J ' where |J7"| may be larger than the 
dimension of H. 

Example 1.48 (Inverse relationship for frame operators) Continue with 
the 1-tight frame from ( 11.15) . We have already seen that $$* = -?2x2- We also 



have 



$*$ 



2/3 


-1/3 


-1/3 


1/3 


2/3 


-1/3 


1/3 


-1/3 


2/3 



* I: 



3x3- 



The rank of $*$ is 2. 
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Orthogonal Projection Since a frame for H generally has more than the minimum 
number of vectors needed to span H , omitting some terms from the synthesis sum 
( 11.140) does not necessarily restrict the result to a proper subspace of H . Thus, a 
frame (even a 1-tight frame) does not yield a result analogous to Theorem 1.401 for 
computing orthogonal projections on H. 

A different orthogonal projection property is easy to verify for any 1-tight 
frame: $*$ : £ 2 {J) — > C 2 (J) is the orthogonal projection onto 7£($*). This has 
important consequences for robustness to noise that are developed fully in later 
chapters. 



General Frames 



Tight frames are a small class of frames as defined in Definition 1.481 A frame 
may generally have frame bounds that differ, the distance between which gives us 
information about the quality of the frame. 



Dual Frame Pairs and Expansion When a frame $ is not 1-tight, to find expan- 
sions coefficients with respect to $ with a linear operator requires a second frame, 
in analogy to biorthogonal pairs of bases. 



Definition 1.52 (Dual pair of frames) The sets of vectors $ = 


; {fk}keJ C 


H and $ = {<Pk}kej C H, where J is finite or countably infinite, 


are called a 


dual pair of frames for Hilbert space H when 




(i) each is a frame for H; and 




(ii) for any x in H, 




x = /]{x, $k)Vk 


(1.144a) 


fce/c 




= $$*a;. 


(1.144b) 



Note that this definition combines the roles of Definition 1.421 and Theorem 11.431 
for general frames. This is necessary because no simple pairwise condition between 
vectors like (1.102) will imply ( 11.144) . 



2 ) Let $ be the frame for M 2 de- 
Since synthesis operator $ is a 2 x 3 matrix with rank 2, it has 



Example 1.49 (Dual pairs of frames in 
fined in (OM 



infinitely many right inverses; any right inverse specifies a frame that forms a 
dual pair with $. Examples include the following: 



1 












}■ 


1 







-1 




-1 


}■ 


{ 


2/3 




-1/3 




-1/3 


1 


3 


2 


1 


1 


1 


' 


2 


) 


1 


-1/3 


) 


2/3 


t 


-1/3 



}■ 



The first of these examples demonstrates that a frame can have colinear elements; 
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a frame can furthermore include the same vector multiple times! 22 ] The third of 
these examples is the canonical dual, which will be defined shortly. 

Inner Product Computation Suppose $ is a frame for H, and x = &a and y = <3?/3. 
Then, just as in ( 11. 1 13[ ) . we can write 

(x, y) = ($a, $/?) = (Ga, /3) = f3*Ga, 

where G = $*$ is the Gram matrix defined in ( 11.112) . This shows how to use 
frame expansion coefficients to convert an inner product in H to an inner product 
in £ 2 {J). 

The key difference between the Gram matrix of a frame and the Gram matrix 
of a basis is that G is now not necessarily invertible. In fact, it is an invertible 
bounded operator if and only if the frame is a Riesz basis. 



Inverse Analysis and Synthesis Condition ( ]1.144b[ ) shows that if sets $ and <£ are 

a dual pair of frames, synthesis operator $ is a left inverse of analysis operator $*. 
As we saw before with 1-tight frames, $ is generally not a right inverse (hence, not 
an inverse) of <J>* because $*<!> ^ I. 

The roles of the two frames in a dual pair of frames can be reversed, so 

x = 22(x, <Pk)<Pk (1.145a) 

fee/c 

= $$*a;. (1.145b) 

Thus synthesis operator $ is the left inverse of analysis operator $*. This and 
several other elementary properties of dual pairs of frames are established in Exer- 
cise 1.521 

Oblique Projection If sets <I> and <I> are a dual pair of frames, the operator P = 
$*$ is a projection operator. Checking idempotency of P is straightforward: 

pi = f$*$V$*$) = $* U>§*\ $ ( = } $*/$ = $*$ = P, (1.146) 

where (a) follows from synthesis operator $ being a left inverse of analysis operator 
$*. 

Canonical Dual Frame So far we have derived properties of a dual pair of frames 
without regard for how to find such a pair. Given one frame $, there are many 
frames $ that complete a dual pair with $. There is a unique choice called the 
canonical dual framecz\ that is important because it leads to an orthogonal projection 
operator on £ 2 (J). 



22 Allowing multiplicities generalizes the concept of set to multisets, but we will continue to use 
the simpler term. 

23 Some authors use "dual" to mean "canonical dual." We will not adopt this potentially- 
confusing shorthand because it obscures the possible advantages that come from flexibility in the 
choice of a dual. 
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For sets <& and $ to form a dual pair of frames requires the associated operators 
to satisfy $$* = I on H; see ( j 1.144b] ). As established in ( 11.146] ), this makes 
P = <J>*<j> a projection operator. When, in addition, P is self-adjoint, it is an 
orthogonal projection operator. Setting 

$ = ($$*)" 1 $ (1.147a) 



r 1 *, 



satisfies ( 1 1.144b] ) and yields 

p = $*$ = ('($$*)^ i $y$ = $*($$* 

which is self-adjoint. From ( ]1.147aj ), the elements of the canonical dual are 

^ fc = ($$TV-, kej. (1.147b) 



When H = C (or K ), the computations in (1.147) are straightforward; for 
example, the third dual frame in Example 1.491 is a canonical dual. In general, it 
is difficult to make these computations without first expressing the linear operator 
$$* using a basis. 

1.5.5 Matrix Representations of Vectors and Linear Operators 

A basis for H creates a one-to-one correspondence between vectors in H and se- 
quences in £ 2 (/C). As discussed in Section 1.5.21 an orthonormal basis preserves 
geometry (inner products) in this correspondence; see Figure 11.191 Even without 
orthonormality, using a basis is a key step toward computational feasibility because 
a basis allows us to do all computations with sequences. Here our intuition from 
finite dimensions may get in the way of appreciating what we have gained because 
we take the basis for granted. Computations in the Hilbert spaces C are relatively 
straightforward in part because we use the standard basis automatically. Compu- 
tations in other Hilbert spaces can be considerably more complicated; for example, 
integrating to compute an £ 2 (M) inner product can be difficult. With sequences, 
the greatest difficulty is that if the space is infinite dimensional, the computation 
may require some truncation. Limiting our attention to vectors with finite £ 2 (JC) 
norm ensures that the truncation can be done with small relative error; details are 
deferred to Chapter [5] 

We get the most benefit from our experience with finite-dimensional linear 
algebra by thinking of sequences in £ 2 (K.) as (possibly-infinite) column vectors. A 
linear operator can then be represented with ordinary matrix-vector multiplication 
by a (possibly-infinite) matrix. One goal in the choice of bases for the domain and 
codomain of the operator is to make this matrix simple. While like a basis a frame 
also enables representations using sequences, lack of uniqueness of the representa- 
tions creates some additional intricacies; these are explored through exercises. 

Change of Basis: Orthonormal Bases Let $ = {ifk}keK an d ^ = {^k}keK be 
orthonormal bases for Hilbert space H . Since bases provide unique representations, 
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Figure 1.27: An orthonormal basis in R 2 generated by rotation of the standard basis. 



for any a; in If we can use synthesis operators to write x = &a and x = \P/3 for 
unique a and (3 in £ 2 (K,). The operator C$ ; ^ : £ 2 (IC) — » £ 2 (IC) that maps a to (3 is 
a change of basis from $ to ^/. 

Since \P has inverse \I/*, we could simply write C$^ = *&*<&. This solves our 
problem because 



C^^a = (**$)a = **($«) = ^*x 



/9. 



In a finite-dimensional setting, this is a perfectly adequate solution because we know 
how to interpret $*$ as a product of matrices. We illustrate this in the following 
example. 

Example 1.50 (Change of basis by rotation) Let {ip , (pi} be the basis 
for R 2 shown in Figure 1.27[ 



<A) 



cost 
sin( 



<Pl 



- sm( 
cost 



Let {ipo, 4>i} be the standard basis for R 2 . The change of basis matrix from $ 
to * is 



C$ * 



#*$ 



1 
1 



cost/ 
sin# 



- sm( 
cost 



cost 
sin t 



- sm( 
cos ( 



Consider the vector in M. 2 that has representation a = [l 0] with respect 
to $ (not with respect to the standard basis). This means that the vector is 



1 • tpo + ■ tpi 



<PQ 



cost 
sinl 



where the final expression is with respect to the standard basis ^. This agrees 
with the result of multiplying C$^a. 

Multiplying by (7$^ is a counterclockwise rotation by angle 9. This agrees 
with the fact that the basis <& is the standard basis Vt rotated counterclockwise 
by#. 
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In the previous example, since multiplication by a 2 x 2 matrix is simple, it makes 
little difference whether we interpret >])*<j> as a composition of two operators or as 
a single operator. In general, Cq,^ should not be implemented as a composition of 
$ followed by \&* because we do not want to return to computations in H , which 
may be more complicated than computations on coefficient sequences. Instead, we 
would like to think of C$ ; ^ as a \K\ x \K\ matrix, even if \K\ is not finite. 

Because of linearity, we can form the matrix (7$^ by finding C$^a for par- 
ticular values of a. Let a = Sk for some k € K,, using the Kronecker delta notation 
from ( 1 1 . 9 [ ) . Then x = <&a = ifk- Since *& is an orthonormal basis, the unique 
expansion of x with respect to *& is 






x, ipi)ipi = y^(yfc, ipiWi, 



ieic 



from which we read off «th coefficient of (3 as fy = (tfk, ipi)- This implies that 
column k of matrix C$ : ^ is {{tpk, ipi)}ie)C- The full matrix, written for the case of 
/C = Z, is 



(7$ $ 



<p-l, Ip-l) 


(ifO, tp-i) 


(<P1, 1p-l 


(<p-i, i>o) 


(fo, i(>o) 


(<Pl, V>o) 


(<p-i, i>i) 


(<P0, Ipl) 


(fi, V'O 



(1.148) 



Example 1.51 (Change to standard basis) Let $ = {ipk}kei, be any or- 
thonormal basis for f 2 (Z), and let ^> be the standard basis for £ 2 (Z). Then for 
any integers k and i, 

(<Pk, ipi) = <fk,i, 

the ith- indexed entry of the fcth vector of $. The change of basis operator ( 11.148) 
simplifies to 



C<& ■$/ 



^-1,-1 Vo.-i ¥>1,-1 

</?l,0 

Pl.l 



P-1,0 

V— 1,1 ¥>o,i 



<A),0 



a matrix with the initial basis elements as columns. 



Change of Basis: Biorthogonal Pairs of Bases We now derive the change of basis 
operator without assuming that the bases are orthonormal. Let $ = {<fk}k<EK. and 
^ = {V'fclfeeK; be bases for Hilbert space H. For any x in H, we can again write 
x = $a and x = \I//3 for unique a and /3 in £ 2 (/C). 

Since \& must be invertible, we could simply write C$.* = ^ _1 $ because then 



C$,*a = ($ 1 $) a = * : ($a) 



$" 



0. 
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As before, we would not want to implement C^* as a composition of two operators 
where the first returns computations to H . Here we have the additional complication 
that $ _1 may be difficult to interpret. 

Because of linearity, we can form the matrix (7$^ by finding C$^a for par- 
ticular values of a. Let a = 5k for some k G K. Then x = $a = ifk- If \P and 'F 
form a biorthogonal pair of bases for H, the unique expansion of x with respect to 
>F is 












from which we read off «th coefficient of as /3j = (<pk, tpi)- This implies that 
column k of matrix C$ : ^ is {(<£k, ipi)}i<£K.- The full matrix, written for the case of 
/C = Z, is 



C$ \p 



(v-i, V>-i) (po. 4>-i) (<pi, V^-i) 

<^i, V>o) 



(P-l, «Al) 



(<A>, V'o 



(<y9 , V'l) (Vl, V'l) 



'1.149) 



Note that C§^ depends on only one dual — the dual of the new representation basis 
<F. If the dual ^ were not already available, computation of C$^ could be written 
in terms of the inner products in ( 11.148J ) and the Gram matrix of >F. 



Matrix Representation of Linear Operator with Orthonormal Bases Consider a 
Hilbert space H with orthonormal basis "F = {(pk}keK., and let A : H — > H be a 
linear operator. A matrix representation T allows A to be computed directly on 
coefficient sequences in the following sense: If 



y - Ax (1.150a) 

(1.150b) 

Pk<Pk, (1.150c) 

(3 = Fa. (1.150d) 

These relationships are depicted in Figure 1.281 

To find the matrix representation, note that the fcth coefficient of the expansion 
of y with respect to "F is 



where 



and 



then 



22 am 



keK 



Pk = {y, <pk) = (Ax, ip k ) = (^(Zjgk: ^)' fk) 



(d) ,^ . , (e) 



E 



ai(Aipi, (p k ), 



(1.151) 
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Figure 1.28: Conceptual illustration of the computing a linear operator A : H — > H 
using a matrix multiplication Y : £ 2 ()C) — ► £ 2 (IC). The abstract y — Ax can be replaced by 
the more concrete j3 = Tot, where a and /3 are the representations of x and y with respect 
to orthonormal basis $ of if . 



where (a) follows from the expression for expansion coefficients in an orthonormal 
basis, ( J1.84aj ); (b) from ( |1.150a| ); (c) from ( |1.150b[ ); (d) from the linearity of A: and 
(e) from the linearity in the first argument of the inner product. This computation 
of one component of /3 as a linear combination of components of a determines one 
row of the matrix V. By gathering the equations ( 11.151J ) for all k G K. (and assuming 
K, = 1 for concreteness), we obtain 



(Af-i, ifio) 

(A(p-!, Lpi) 



(Aip 0l tp_ 



{Aipo, ipo) 



{Atpo, tpi) 



(Aipx, ip ) 
{Aipx, ipi) 



To check that ( 1 1 . 1 5 2 [ ) makes sense in a simple special case, let H 
let $ be the standard basis. For any k and i in {0, 1, . . . , N — 1}, 



(1.152) 



C^ and 



Fi,k = {Atfik, <fi) 



H,k 



because Aip^ is the fcth column of A, and taking the inner product with ipi picks out 
the ith entry. Thus the conventional use of matrices for linear operators on C N is 
consistent with ( jl.1521 ) . This extends also to the use of the standard basis of £ 2 (Z) . 
For a given operator A, a frequent goal in choosing the basis is to make T 
simple, for example, diagonal. 

Example 1.52 (Diagonalizing basis) Let H = R N , and consider a linear 
operator A : H — > H given by a symmetric matrix. Such a matrix can be decom- 
posed as A = $A$ T , where the columns of unitary matrix $ are eigenvectors of A 
and A is the diagonal matrix of corresponding eigenvalues, {Aq, X%, . . . , Aat_i}; 
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Figure 1.29: Conceptual illustration of the computing a linear operator A : Ho — > Hi 
using a matrix multiplication T : £ 2 (JCo) — ► ^ 2 (/Ci). The abstract j/ = ylx can be replaced 
by the more concrete j3 — Ta, where a is the representation of x with respect to basis «3> 
of Ho , and f3 is the representation of y with respect to basis 9 of Hi. 



see ( ]1.223bj ). Expressing the operator with respect to the orthonormal basis $ 
we obtain 



I\fc = (Aipk, tpi) = (Afe^fe, ifii) = Afc((/? fc , (fit) = X k 5i 



where (a) follows from ( |1.152| ); (b) from (Afc, tfk) being an eigenpair of A; (c) from 
linearity in the first argument of the inner product; and (d) from the orthonor- 
mality of <J>. Thus, the representation is diagonal: multiplication of a vector by 
a matrix A is replaced by pointwise multiplication of expansion coefficients of x 
by the eigenvalues of A. 



This simple example is fundamental since many basis changes aim to diagonalize 
operators. For example, we will see in Chapter \2\ that the discrete Fourier trans- 
form diagonalizes the circular convolution operator because it is formed from the 
eigenvectors of the circular convolution operator. As in the example, multiplication 
by a dense matrix becomes a pointwise multiplication in a new basis. 

Moving to cases where the domain and codomain of the linear operator are 
not necessarily the same Hilbert space, consider a linear operator A : Hq — > H\, 
and let $ = {ipk}keK De an orthonormal basis for Hq and ^ = {ipk}keKi De 
an orthonormal basis for H\. We would like to implement A as an operation on 
sequence representations with respect to $ and *P. The concept is depicted in 
Figure 1.291 where orthonormality of the bases gives (P -1 = $* and "J* -1 = ty*. 
The computation y = Ax is replaced by (3 = Ta, where a is the representation of x 
with respect to $ and (3 is the representation of y with respect to ^. 

Mimicking the derivation of ( ]1.151[ ) leads to a counterpart for ( jl.1521 ) that 
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uses both bases: 



{A(p_i, V-i) 
{Alp-!, ip ) 
(A<p-i, V>i) 



(Aipo, tp. 



(Aipo, Vo) 



{A<po, ipi 



(A<pi, ip-i) 
{A<pi, *l>o) 
{Aipx, ipi) 



(1.153) 



Note the information upon which T is determined: by linearity of A and complete- 
ness of the basis $, the effect of A on any x € Hq can be computed from its effect on 
each basis element of the domain space, {A(fk}keK ] and the expansion coefficients 
of any one of these results is determined by inner products with the basis in the 
codomain space, {{Atpk, ipijjieKx- 



Example 1.53 (Averaging operator) Consider the operator A : H 
that replaces a function by its average over intervals of length 2, 



Hi 



y{t) = Ax(t) 



2(<M-1) 



x(t) dr, 



for 2£ < t < 2( 



£e 



(1.154) 



■it 



where Hq the space of piecewise-constant, finite-energy functions with break- 
points at integers and Hi the space of piecewise-constant, finite-energy functions 
with breakpoints at even integers. As orthonormal bases for Hq and Hi, we 
choose normalized indicator functions over unit and double- unit intervals, re- 
spectively: 






{<Pk(t)}kez = {X[k,k+i){t)}kez, 

{lpi(t)}i£Z = {^X[2t,2(i+l))(*)}i£2 



with the indicator function defined as 

Xi(t) = 



1, for t e I; 
0, otherwise. 



(1.155) 



To evaluate T from (1.153) requires (Aipj~, ipi) for all integers k and i. Since 
<Po(t) is nonzero only for t G [0, 1), 



from which 



and 



Aip (t) = 

(Acpo, ip ) 
(Aipo, -ipi) 



for < t < 2; 



0, otherwise, 



2V2 



(h 



1 



for all i 7^ 0. 
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Since A integrates over intervals of the form [2£, 2(£ + 1)], Aipi = Aipo, so 

(Aip!, 1pi) = 



73' for 2 = 0; 
0, otherwise. 

Continuing the computation to cover every ipk yields T: 



r = 4= 



11 
10 
11 



Multiplying by V is thus a very simple operation. 



(1.156) 



Matrix Representation of Linear Operator with Biorthogonal Pairs of Bases As 

above, consider a linear operator A : Hq — > H±. Assume $ and $ form a biorthogo- 
nal pair of bases for Hq, and ty and ty form a biorthogonal pair of bases for H%. We 
would like to implement A as an operation on sequence representations with respect 
to $ and ^ as in Figure fl.29i where biorthogonality of the bases gives $ _1 = $* 
and * _1 = <]/*. 

Derivation of V, the matrix representation of the operator A, is almost un- 
changed from orthonormal case, but we repeat the key computation to show the 
role of having biorthogonal bases. When 



the expansion of 



.'/ 



Ax 



(1.157) 
(1.158) 



with respect to ^f has fcth coefhcient 



Pk = {y,-4>k) = (Ax, ip k ) = (A(^2 i&Ko awi), ip k ) 
= (Y,ie>c a i A( Pi,^k) = ^2ati{A<pi,Tpk}, 

where (a) follows from the expression for expansion coefficients with a biorthogonal 
pair of bases, ( 1 1.104a) ; (b) from ( 11. 1581 ); (c) from ( 11.157) ; (d) from the linearity of 
A; and (e) from the linearity in the first argument of the inner product. Thus the 
matrix representation (and assuming )Cq = !Ci = Z for concreteness) is 



(Aip-i, 4> ) 



(Aip , V- 



(Atpo, ip ) 



(Atpo, ipi 



(Atpi, V'-i) 
(A(pi, iJ ) 
(Aip 1: fa) 



(1.159) 
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Comparing ( ] 1.1531 ) and ( 11.159) , the only difference is the use of the dual basis for 
the codomain space; this is natural since we require expansions of {A(pk}keK with 
respect to 'P. Also note the similarity between ( ]1.149[ ) and ( II. 159ft ; a change of 
basis operator is a special case of a matrix representation of a linear operator where 
Hq = Hi = H and the operator is the identity on H. 

The next example points to how differential operators can be implemented as 
matrix multiplications once bases for the domain and codomain spaces are available. 

Example 1.54 (Derivative operator) Consider the derivative operator A : 
Hq — > Hi, with Hq the space of piecewise-linear, continuous, finite-energy func- 
tions with breakpoints at integers and Hi the space of piecewise-constant, finite- 
energy functions with breakpoints at integers. As a basis for Hq, choose the hat 
function 

' 1 — |*|, fox |*| < 1; 
0, otherwise 



<p(t) 



(1.160) 



and its integer shifts: 



$ 



{y>k(t)}kez = {<p(t- k)} 



A-e: 



(This is an infinite-dimensional analogue to the basis in Example 1.41 and an 
example of a spline, discussed in detail in Chapter [5}) For H±, we can choose the 
same orthonormal basis as in Example 11.531 



* 



{■tpi(t)}izz = {X[i,i+i){t)}iez- 



To evaluate T from ( jl.1591 ) requires (Aifk, ipi) f° r an integers k and i. Since 
Aip(t) = v'{t) = 
it follows that 



1, for -1 < t < 0; 
-1, for0<£<l; 

0, for Itl > 1, 



{1, for i = — 1; 
-1, for i = 0; 
0, otherwise. 

Shifting tp(t) by k simply shifts the derivative: 



(Atpk, A) = 
Gathering these computations into a matrix yields 



1, for % = fe — 1; 

1, for i = k; 
0, otherwise. 






-1 


1 

















)-ll 


1 

















-1 


1 






(1.161) 
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Figure 1.301 gives an example of a derivative operator and its computation. 
The input function and expansion coefficients in the basis $ are 

x{t) = ^(t)-^(t-l), 

a = L. |T| -1 ...] , 

while its derivative and expansion coefficients in the basis \I> are 

x'(t) = ^(t + l)-2V>(t) + V(*-l)i 

P = [... 1 -2 1 ...] T . 

Then, indeed 



ft 
















1 






|-2| 


= 




1 























-1 


1 

















l-ll 


1 

















-1 


1 











-1 






Ta. 



Matrix Representation of Adjoint Example 1 1 . 1 E)(ii) | confirmed that the adjoint of 
a linear operator A : C N —> C M given by a finite matrix is the Hermitian transpose 
of the matrix; implicit in this was the use of the standard bases for C^ and C M . The 
connection between the adjoint and the Hermitian transpose of a matrix extends to 
arbitrary Hilbert spaces and linear operators when orthonormal bases are used. 

Consider a linear operator A : Hq — » H\, and let $ = {(pk}keK be an or- 
thonormal basis for Hq and \& = {ipk}k£Ki be an orthonormal basis for H±. Let 
r be the matrix representation of A with respect to $ and 9, as given by ( 11.153J ). 
The adjoint A* is an operator Hi — > Hq, and we would like to find its matrix rep- 
resentation with respect to V& and <J>. Applying ([1.1531 ) to A*, the entry in row i, 
column k is 



{A*ipjf, <Pi 



(a) 



(tpk, Aipi 



(b) 



{Aipi, ipk)' 



(c) 



kAi 



where (a) follows from the definition of adjoint; (b) from Hermitian symmetry of 
the inner product; and (c) from ( jl.1531 ). Thus, the matrix representation of A* is 
indeed the Hermitian transpose of the matrix representation of A. 

Now remove the assumption that $ and $ are orthonormal bases and denote 
the respective dual bases by $ and \P. Let T be the matrix representation of A with 
respect to $ and ^P, as given by ( J1.159J ). The matrix representation of A* has a 
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<p{t),-<p(t-l) 




(a) Function x(t). 
x'(t) 



(b) Decomposition in basis for Hq. 

d(t + l),-2V>(t),V(*-l) 



(c) Derivative x'{t). (d) Decomposition in basis for Hi. 

Figure 1.30: Example of derivative operator. 



simple form with respect to the duals \P for Hi and $ for Hq. Applying ( 11.1594 to 
^4*, the entry in row i, column k is 



{A*$ k , <pi 



(a) 



(tpk, Aifi 



(t>) 



(Aipi, Tpk) 



(c) 



■pi* 



where (a) follows from the definition of adjoint; (b) from Hermitian symmetry of the 
inner product; and (c) from ( 11,159) . Thus, the matrix representation of A* is the 
Hermitian transpose of the matrix representation of A when the bases are switched 
to the duals. To represent A* with respect to ^> and <J> rather than with respect to 
their duals is a bit more complicated. 

1.6 Computational Aspects 

The cost of an algorithm is generally measured by the number of operations needed 
and the precision requirements for both the input data and the intermediate re- 
sults. These cost metrics are of primary interest and enable comparisons that are 
independent of the computation platform. Running time and hardware resources 
(such as chip area) can be traded off through parallelization and are also affected 
by more subtle algorithmic properties. 

We start with the basics of using operation counts to express complexity of a 
problem and cost of an algorithm. We then discuss precision of the computation in 
terms of the arithmetic representation of a number, followed by conditioning as the 
sensitivity of the solution to changes in the data. We close with one of the most 
fundamental problems in linear algebra: solving systems of linear equations. 
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1.6.1 Cost, Complexity, and Asymptotic Notations 

Cost and Complexity For a given problem, many algorithms may exist. We can 
measure the cost of computing each of these algorithms and define the complexity of 
the problem as the minimum cost over any possible algorithm. The definition of cost 
should reflect the consumption of relevant resources — computation time, memory, 
circuit area, energy, etc. Sometimes these resources can themselves be traded off; 
for example, parallelism trades area for time or, since slower circuits can operate 
at lower power, area for energy. These trade-offs depend on intricacies of hard- 
ware implementations, but counting arithmetic operations is enough for high-level 
comparisons of algorithms. In particular, we will see benefits from certain problem 
structures. Traditionally, one counts multiplications or both multiplications and 
additions. A multiplication is typically more expensive than an addition, as can be 
seen from the steps involved in long- hand multiplication of binary numbers. 

The complexity of a problem depends on the computational model (which 
operations are allowed and their costs), the possible inputs, and the format of the 
input. 

Example 1.55 (Complexity of polynomial evaluation) There are several 
algorithms to evaluate 

x(t) = a + a x t + a 2 t 2 + a 3 t 3 . (1.162) 

The most obvious is through 

output : ao + ai ■ t + (a 2 ■ t) ■ t + ((03 • t) ■ t) ■ t, 

which has fi = 6 multiplications and v = 3 additions. This is wasteful because 
powers of t could have been saved and reused. Specifically, the computations 

t 2 = t ■ t; t 3 = t 2 ■ t; output : a + a\ ■ t + a 2 ■ t 2 + a 3 ■ £3 

give the same final result with /i = 5 and v = 3. An even cheaper algorithm is 

output : ciq +t ■ (ai + t ■ (a 2 + t ■ 03)), 

with fi = 3 and v = 3. In fact, /! = 3 and v = 3 are the minimum possible mul- 
tiplicative and additive costs (and hence the problem complexity) for arbitrary 
input (ao,ai,a 2 ,a 3 ,t). Restrictions on the input could reduce the complexity. 

Other formats for the same polynomial lead to different algorithms with 
different costs. For example, if the polynomial is given in its factored form 

x(t) = b Q {t + bi)(t + b 2 ){t + 63), (1-163) 

it will have a natural implementation with /! = 3 and v = 3, matching the 
complexity of the problem. However, a real polynomial could have complex 
roots. When using real operations to measure cost, one could assign a complex 
multiplication {n,v) = (4,2) and a complex addition (fj,, v ) = (0, 2)J 24 j The 



24 These costs are based on the obvious implementation of complex multiplication as (a + jb)(c + 
jd) = (ac — bd)+j(ad+bc); a complex multiplication could be computed with 3 real multiplications 
and 5 real additions, as in Exercise 1.591 
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algorithm based on the factored form ( | 1.163) then has higher cost than that 
based on the expanded form ( |1.162[ > . 

This example illustrates that mathematically-equivalent expressions may not be 
equivalent for computation. We will revisit this, again in the context of polynomials, 
when we discuss precision. 

The scaling of cost and complexity with the problem size is typically of interest. 
For example, the complexity determined in Example 1.551 generalizes to n and v 
equaling the degree of the polynomial. Finding exact complexities is usually very 
difficult, and we are satisfied with coarse descriptions expressed with asymptotic 
notations. 

Asymptotic Notation The most common asymptotic notation is the big O, rooted 
in the word order. While we define it and other asymptotic notations for sequences 
indexed by n G N with n — > 00, the same notation is used for functions with the 
argument approaching any finite or infinite limit. Informally, x = 0{y) means that 
x n is eventually (for large enough n) bounded from above by a constant multiple of 
Vn- 



Definition 1.53 (Asymptotic notation) Let x and y be sequences defined 
on N. We say: 

(i) x is O of y and write x G 0{y) or x = 0(y) when there exist constants 7 > 
and n G N such that 

< x n < 7J/ n , for all n > no; (1.164) 

(ii) x is o of y and write x G o(y) or x = o(y) when, for any 7 > 0, there exists 

a constant no G N such that ( | 1.1641 ) holds; 
(iii) x is fl of y and write x G £l(y) or x = fl(y) when there exist constants 7 > 
and no G N such that 

IVn < Xn, for all n > n ; 

(iv) x is Q of y and write x G (y) or x = &(y) when x is both O of y and f2 of 
y, that is x G 0(y) H O(y). 



The convenient asymptotic notations necessitate a few notes of caution: The use 
of an equals sign is an abuse of that symbol because asymptotic notations are not 
symmetric relations. Also, if the argument over which one is taking a limit is not 
clear from the context, it should be written explicitly; for example, 2 m n 2 = O n (n 2 ) 
is correct when m does not depend on n because the added subscript specifies that 
we are interested only in the scaling with respect to n. Finally, all the asymptotic 
notations omit constant factors that can be critical in assessing and comparing 
algorithms. 
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Computing the Cost of an Algorithm Most often, we will compute the cost of an 
algorithm in terms of the number of operations, multiplications /i and additions v 1 
for a total cost of 

C = n + v, (1.165) 

followed by an asymptotic estimation of its behavior in O notation. 



Example 1.56 (Matrix multiplication) We illustrate cost and complexity 
with one of the most basic operations in linear algebra: matrix multiplication. 
Using the definition ( 11.204) directly, the product Q = AB, with A £ C MxN and 
B 6 C WxP , requires N multiplications and N — 1 additions for each Qik, for a 
total of (J, = MNP multiplications and v = M(N — 1)P additions, and a total 
cost of C = M(2N - 1)P. Setting M = P = N, the cost for multiplying two 
N x N matrices is 



L it 



2N 3 - N 2 



(1.166a) 



and setting M = N and P = 1, the cost of multiplying an N x N matrix by an 
N x I vector is 



C n 



2N 2 - N. 



(1.166b) 



By specifying that ( 11.204) is to be used, we have implicitly identified a par- 
ticular algorithm for matrix multiplication. Other algorithms may be preferable. 
There are algorithms that reduce the number of multiplications at the expense 
of having more additions; for example, for the multiplication of 2 x 2 matrices, 
( 11.204) gives a cost of 8 multiplications, but we now show that the computation 
can be accomplished with 7 multiplications. The product 



Qoo 


Qoi 




Aoo 


A i 




Boo 


Boi 


Qio 


Qn 




A w 


An 




Bio 


Bn 



can be computed from intermediate results 



as 



h = 


(A i-A n )(Bio + B n ), 




h-i 


= A 00 ( 


hi = 


{A 00 +Aii)(Boo + Bii), 




h:-, 


= An( 


h 2 = 


(Aoo — ^4io)(-Boo + B i), 




he 


= (Aio 


h 3 = 


{Aoo + Aoi)Bn, 








Qoo - 


= h + hi - h 3 + h 5 , 


Qoi 


= 


h 3 + hi 


Qio - 


= h 5 + h 6 , 


Qn 


= 


hi - h 2 



(B i + Bn), 
10 + -Boo), 
- -An) -Boo, 



hi 



The /i = 7 multiplications is the minimum possible, but the number of addi- 
tions is increased to v = 18. This procedure is Strassen's algorithm, (see Solved 
Exercise 11.6) . 
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1.6.2 Precision 

Counting operations is important, but so is the trade-off between the cost of a 
specific operation and its precision. In digital computation, operations are over a 
(possibly huge) finite set of values. Not all operations are possible in the sense 
that many operations will have results outside the finite set. How these cases 
are handled — primarily through rounding and saturation — introduces inaccuracy 
into computations. Properties of this inaccuracy depend heavily on the specific 
arithmetic representation of a number. We assume binary arithmetic and discuss 
two dominant cases: fixed-point and floating-point arithmetic. 

Fixed-Point Arithmetic Fixed-point arithmetic deals with operations over a finite 
set of evenly-spaced values. Consider the values to be the integers between and 
2—1, where B is the number of bits used in the representation; then, a binary 
string (60, 61, • • • , &b-i) defines an integer in that range, given by 

x = b + b 1 -2 + b 2 -2 2 + --- + b B - 1 -2 B - 1 . (1.167) 

Negative numbers are handled with an extra bit and various formats; this does not 
change the basic issues. 

The sum of two B-bit numbers is in {0, 1, . . . , 2 B — 2}. The result can thus 
be represented exactly unless there is overflow, the case of the result being too 
large for the number format. Overflow could result in an error, saturation (setting 
the result to the largest number 2 B — 1), or wraparound (returning the remainder 
in dividing the correct result by 2 B ). All of these make an algorithm difficult to 
analyze and are generally to be avoided. One could guarantee there is no overflow 
in the computation x + y by limiting x and y to the lower half of the valid numbers 
(requiring the most significant bit 6b-i to be 0). Reducing the range of possible 
inputs to ensure accuracy of the computation has its limitations; for example, to 
avoid overflow in the product x ■ y requires restricting x and y to be less than 2 s ' 2 
(requiring half of the input bits to be zero) . 

Many other operations, like square root or division by a nonzero number, have 
results that cannot be represented exactly. Results of these operations will generally 
be rounded to the nearest valid representation point. Thus, assuming no overload, 
the error from a fixed-point computation is bounded through 

\x - £|/(cc max - z min ) < 5/(2 B -l) « 2-( B+l \ (1.168) 

where x is the exact result, x is the result of the fixed-point computation, and 
i ma i — a; m i n is the full range of valid representation points. The error bound is 
written with normalization by the range to highlight the role of the number of bits 
B. To contrast with what we will see for floating-point arithmetic, note that the 
error could be small relative to x (if |x| is large) or large relative to x (if x is near 
0). 

Floating-Point Arithmetic Floating-point representations use numbers spread over 
a vastly larger range so that overload is largely avoided. A floating point represen- 
tation has significant digits and an exponent. The significant digits are called the 
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mantissa, or significant and are defined like a fixed-point number, with the frac- 
tion point typically after the most significant digit, which is 1 by definition. The 
exponent is a fixed-point binary number used to scale the significand by a power of 
2. Consider just strictly positive numbers and suppose B bits are divided into Bs 
bits for significand and Be = B — Bs bits for exponent. Then, a number x can be 
represented in floating-point binary form as 

x = jl + ]T&„2-™J 2 B , (1.169) 

where E is a fixed-point binary number having Be bits chosen so that the leading 
bit of the significand is 1. Of course, this representation still covers a finite range 
of possible numbers, but because of the significand/exponent decomposition, this 
range is larger than compared to fixed-point arithmetic. 

Example 1.57 (32-bit arithmetic) In the IEEE 754-2008 standard for 32-bit 
floating-point arithmetic, 1 bit is reserved for the sign, 8 for the exponent, and 23 
for the significand. Since the leading 1 to the left of the binary point is assumed, 
the significand effectively has 24 bits. The value of the number is given by 




]>> 2 ~ 



2*-"' (1.170) 



for E G {1, 2, . . . ,254}. The two remaining values of E are used differently: 
E = 255 is used for ±00 and "not a number" (NaN); and E = is used for if 
the significand is zero and subnormal numbers if the significand is nonzero. The 
subnormal numbers extend the range of representable positive numbers below 
2 -i26 though 

23 
x = (-l) si s n (^6 n 2-™)2- 126 . 



All the positive numbers that can be represented through ( 11.1701 ) lie in 

[2~ 126 , 2 • 2 127 ] « [1.18- 10~ 38 , 3.4- 10 38 ], 

and similarly for negative numbers. To this we add subnormal numbers, the 
minimum of which is approximately 1.4 ■ 10~ 45 . Comparing this to [0,4.3 ■ 10 9 ] 
(with integer spacing) for 32-bit fixed-point arithmetic shows an advantage of 
floating-point arithmetic. 

In floating-point arithmetic following (1.169) , the difference between a real number 
and the closest valid representation may be large — but only if the number itself 
is large. Suppose x is positive and not too large to be represented. Then its 
representation x will satisfy 

x = (l+e)a;, where |e| < 2- Bs . (1-171) 
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(a) Recursive summation. 



Xo X\ X2 X3 



XN-2 XN-1 




2 M-2_ 1 




(M-l) 



(M-l) 
c 



(b) Tree-based summation. 



Figure 1.31: Two algorithms for computing an average. 



This is very different than ( ] 1.1681 ); the error \x — x\ may be large, but it is not large 
relative to x. 

Multiplication of floating-point numbers amounts to the product of the sig- 
nificands and the sum of the exponents, followed possibly by rescaling. Addition 
is more involved, since when the exponents are not equal, the significands have to 
adjust the fractional point so they can be added. When the exponents are very 
different, a smaller term will lose precision, and could even be set to zero. Never- 
theless, adding positive numbers or multiplying positive numbers, within the range 
of the number system, will have error satisfying ( 11.171} ). Much more troublesome is 
that subtracting numbers that are nearly equal can result in cancellation of many 
leading bits. This is called a loss of significance error. 

Example 1.58 (Computing an average) We now highlight how the choice 
of an algorithm affects precision on one of the simplest operations, computation 
of the average of N numbers, 



x N-l 
X = — 2_^ x n- 



N 



n=0 



An obvious algorithm is the recursive procedure illustrated in Figure d. 31( a), 



c (0) 

,(n) 



JN) 



Xq 

*<■ 
1 

n'' 



,(AT-1) 



1,2, 



N, 



When all x^s are close in value, the summands in step n above differ in size by 
a factor of n (since an"" 1 ) is the partial sum of the first n numbers). With N 
large, this becomes problematic as n grows. 
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A simple alternative is summing on a tree as in Figure 1.31( b) . Assume 
N = 2 M , and introduce sequences Xn as partial sums of 2 l terms, 



r (o) 



X 



W _ r (<-i) , -(<-i) i = l, 2, ..., M, 

- x 2n -l-^n+H n = 0, 1, ..., 2 M -*- 1, 

_ _ J_ (M) 

Because all summations are of terms of similar size, the precision of the result 
will improve. Note that the number of additions is the same as in the previous 
algorithm, that is, N — 1. 

1.6.3 Conditioning 

So far, we have discussed two issues: algorithmic efficiency, or the number of oper- 
ations required to solve a given problem, and precision of the computation, linked 
both to machine precision as well as algorithmic structure (as in Example 11.581 ) . 

We now discuss conditioning of a problem, which describes the sensitivity of 
the solution to changes in the data. In an ill-conditioned problem, the solution can 
vary widely with small changes in the input. Ill-conditioned problems also tend 
to be more sensitive to algorithmic choices that would be immaterial with exact 
arithmetic. We study conditioning by looking at the solution of systems of linear 
equations. 

Given a system of linear equations y = Ax, where a; is a set of N numbers and 
A is an N x N matrix of full rank, we know a unique solution exists, x = A~ 1 y. The 
condition number we introduce shortly will roughly say how sensitive the solution 
x will be to small changes in y. In particular, if the condition number is large, a 
tiny change (error) in y can lead to a large change (error) in x. Conversely, a small 
condition number signifies that the error in x will be of the same order as the error 
in y. 

For this discussion, we use the 2-norm as defined in ( 11.2161 ): 



Similarly, 



|A|| a ,a = sup ||Ab|| 2 = cr max (A) = ^ \ max (A* A) . (1.174a) 

IW|2 = 1 



l^lla = 7-rr = 7= ?=? . (1.174b) 

Vmin(A) y/\ m in(A*A) 



For now, consider the matrix A to be exact, and let us see how changes in y. 
expressed as y = y + Ay, affect the changes in the solution x, expressed as x = 
x + Ax: 

x = x + Ax = A~ x y = A~ 1 y + A~ 1 Ay. (1.175) 

Using that \\Ax\\ < ||A||||x|| for any norm, 

||Ax|| 2 = \\A- l Ay\\ 2 < U^IWIAylla. (1.176) 
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To find the relative error, we divide (1.176) by the norm of x, 



\Ax\\ 



< 



l|Aj/|| : 



\m\2 ■■ "- \\xh " " " " ' UhWxh 

where k(A) is called a condition number of a matrix A, 



l|A ^" 2 < «(i)^, (1.177) 
ii y ii » 



K (A) = \\A\\ 2 \\A~% 



M) 



M) 



"max(A A) 

X min (A*A) ' 



(1.178a) 



When A is a basis synthesis operator, \ m i n (A* A) and \ mguX (A*A) are the constants 
in Definition 11.341 (Riesz basis). For a Hermitian matrix (or more generally a normal 
matrix), the condition number is simply 



k{A) 



^max(^) 



A m in(A) 



(1.178b) 



From (1.177) , we see that k(A) measures the sensitivity of the solution: a small 
amount of noise on a data vector y might grow by a factor k(A) in the solution vector 
x. In some ill-conditioned problems, the ratio between the largest and smallest 
eigenvalues in ( 11.178) can be several orders of magnitude, sometimes leading to a 
useless solution. At the other extreme, the best-conditioned problems appear when 
A is unitary, since then k(A) = 1; the error in the solution is similar to the error in 
the input. 

Poor conditioning can come from a large |A max |, but more often, from a small 
|Amin|, that is, when the matrix A is almost singular. We would like to find out how 
close A is to a singular matrix. In other words, can a perturbation AA lead to a 
singular matrix (A + A A)? It can be shown that the minimum relative perturbation 
of A, or min(||Aj4.||2/||A||2) such that A + AA becomes singular, equals 1/k(A), 
We show this in a simple case, namely, when both A and its perturbation are 
diagonalizable by the same unitary matrix U (this happens for certain structured 
matrices). Then 



U* AU 



A 



and 



U* AAU 



AA, 



(1.179) 



where A is the diagonal matrix of eigenvalues of A and AA is the perturbation. The 
minimum to perturb A into a singular matrix is 



IAAI 



2 (a) . \\U*AAU\\ 2 (6) min||AA|| 2 ( c ) 
— = mm- 



\U*AU\\ 2 



|A|k 



An 



Ar, 



(<0 



1 



k(A) 



(1.180) 



where (a) follows from U being unitary; (b) from (1.179) and the fact that the 
optimization is over the perturbation, not over A; (c) from fl 1.174a) and A — 
diag([0, 0, . . . , A m j n ]) being the singular matrix closest to A and thus min || AA||2 = 
|A min |; and (d) from (1.178b) . 



Example 1.59 (Conditioning of matrices, Example 11.281 cont'd) Take the 
matrix associated with the basis in Example 1.2S li) 



[y>0 fl] 



1 

a 



a G (0, oo) 
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The eigenvalues of A are 1 and a, from which the condition number follows as 

k(A) = 



a, a > 1; 
1/a, a < 1. 



For Example 1.2S Tii) 
A = 



<r°o <pi\ 



1 cos 9 
sin0 



6e(0,ir/2}. 



(1.181) 



The singular value decomposition ( 1 1 . 2 1 3 f ) of this matrix A leads to the condition 
number 



k(A) 



1 + cos 9 
1 — cos 9 ' 



(1.182) 



which is plotted in Figure 1.32( a) on a log scale; A is ill conditioned as 9 — > 0, 
as expected. 

Example 1.60 (Averaging, Example 11.291 cont'd) We take another look at 
Example |1.2S|(ii)| from a matrix conditioning point of view. Take the following 
N x N matrix: 



A 







1 





. 


. 






1 


1 


. 


. 






1 


1 


1 . 


. 


1 

N. 




1 


1 


1 . 


. 1 



which computes the successive averages 



r (fe) 



fc-i 



k 



k = l,2, 



N. 



(1.183) 



While this matrix is nonsingular (a product of a diagonal matrix and a lower- 
triangular matrix, both with positive diagonal entries), solving y = Ax (finding 
the original values from the averages) is an ill-conditioned problem because the 
dependence of y on x n diminishes with increasing n. Figure 1.32( b) shows the 
condition number k(A) for N = 2, 3, . . . , 50 on a log-log scale. 



1.6.4 Solving Systems of Linear Equations 

Having discussed conditioning of linear systems, let us consider algorithms to com- 
pute the solution. 
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k(A) 



200 - 




(a) 



(b) 



Figure 1.32: Behaviors of the condition numbers of matrices, (a) The matrices from 
( [1.181D with condition numbers ( [1.182 j) , plotted on a log scale, (b) The matrices from 
( 11.1831) , plotted on a log-log scale. 



Gaussian Elimination The standard algorithm to solve a system of linear equations 
is Gaussian elimination. The algorithm uses elementary row operations to create 
a new system of equations y' = A' x where A' is now an upper-triangular matrix. 
Then using back substitution from xn-i up to Xq, one finds the unknown vector x. 
We can easily obtain the upper-triangular A' ', by working on one column at 
the time and using orthogonality of length-two vectors. For example, given a vector 
a i ~ ool is automatically orthogonal to it. To transform the 
we premultiply A by the matrix B^> with entries 



a\ I then 



|ao 

first column 



B (D 



1 








Ol,0 


— fl 0,0 





«2,0 





— ao,o 



(1.184) 



_ajv_i i o • • ■ — a ,o 

leading to a new matrix A W = B^'A with the first column a = [ao,o ... 0] 
We can continue the process by iterating on the lower right submatrix of size 
(N — 1) x (N — 1) and so on, leading to an upper-triangular matrix 



A {N- 



B^-^-.-B^B^A 



X X 

x 








X 


X 


x 


x 


x 


X 





x 



(1.185) 



The initial system of equations is thus transformed into a triangular one, 

B("-i)...fl(2)B(D y = A<- N -^x, 
which is solved easily by back substitution. 
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Example 1.61 (Triangularization and back substitution) Givena3x3 
system y = Ax with a rank-3 matrix A, we can show that 



B (D 



B m 



1 








ai,o 


— a 0,0 





O2,0 





_a 0,0 


1 










1 





_0 02,0^0,2 — 00,0^2,1 ao,o a i,i — Ol,0 a 0,l. 
The new system is now of the form 



11 



(2)r(1) 



B^'B 



!l 



B^B^Ax 



A'x, 



(1.186) 



with the matrix A' upper triangular and o^ / (because A is of full rank). We 
can solve for x using back substitution, 



1 



x 2 



"3/2, 



Xi 



X(, 



-i—(Vl ~ 01,2352), 



'1.1 



-j— (v'o -Oo.iaJi -00,2^2)- 



*0,0 



The form of the solution indicates that conditioning is a key issue, since if some 
a[ i is close to zero, the solution might be ill behaved. 

In the above discussion, we did not discuss ordering of operations, or choosing a 
particular row to zero the entries of a particular column. In practice, this choice, 
called choosing the pivot, is important for the numerical behavior of the algorithm. 
In general, the solution of a linear system depends on whether the vector y belongs 
to the range (column space) of A. If not, there is no possible solution. If it does, 
and the columns are linearly independent, the solution is unique; otherwise there is 
an infinite number of solutions. 

Cost of Gaussian Elimination The cost of Gaussian elimination is dominated 
by the cost of forming the product B*- > ■■■B^'B^'A, resulting in the upper 
triangular matrix A^ N K Multiplying B^ 1 ' and A uses Q(N 2 ) multiplications 
and additions, so after N — 2 additional such multiplications A^ N-1 ' is formed 
with 0(N 3 ) multiplications and additions. The other steps are cheaper: forming 
£}(N-i) . . , ^(2)^(1)^ j-^g @(7\T 2 ) cost; and back substitution requires N divisions 
and Q(N 2 ) multiplications and additions. A careful accounting gives a total multi- 
plicative cost of about ^N 3 . 

One can use Gaussian elimination to calculate the inverse of a matrix A. 
Solving Ax = e/s, where eu is the fcth vector of the standard basis, gives the fcth 
column of A^ 1 . The cost of finding each column in this manner is 0(-/V 3 ), so this 
overall algorithm has 6(./V 4 ) cost for the full inverse. This is very inefficient. An 
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inversion algorithm that forms and saves the L U decomposition of A while solving 
Ax = eo with Gaussian elimination has cost approximately I-/V 3 . Note that we 
rarely calculate A~ l explicitly; solving y = Ax is no cheaper with A -1 than with 
the LU decomposition of A. 

Sparse Matrices and Iterative Solutions of Systems of Linear Equations If the 

matrix-vector product Ax is easy to compute, that is, with a cost substantially 
smaller than A?" 2 , then iterative solvers can be considered. This is the case when A 
is sparse or banded as in ( 11.229) and has only O(N) nonzero entries. 

An iterative algorithm computes a new approximate solution from an old 
approximate solution with an update step; if properly designed, it will converge to 
the solution. The basic idea is to write A = D — B, transforming y = Ax into 

Dx = Bx + y. (1.187) 

The update step is 

x (k+i) = D -\Bx^ +y), (1.188) 

which has the desired solution as a fixed point. If D~ x is easy to compute (for 
example, it is diagonal), then this is a valid approach. 
To study the error, let e^' = x — p*l Subtracting 

Dx {k+1) = Bx {k) +y 



from ( TTT87] ) yields 

e (fc+D = D -i Be w = (£-i B ) fc +y°), 

with e^ ' the initial error. The algorithm will converge to the true solution for any 
initial guess x' ' if and only if (D~ 1 B) k+1 — ► as k — > 00, which happens when all 
the eigenvalues of D~ X B are smaller than 1 in absolute value. 

Example 1.62 (Iterative solution of a Toeplitz system) Take a Toeplitz 
matrix A as in ( 11.228) and write it as D - B = I - (I - A). Then ( 11.188) reduces 
to 

x (k+i) = (j_ A } x (k) +y _ (1.189) 

Note that B = I — A is still Toeplitz, allowing a fast multiplication for evaluating 
Bx^ ' , as will be seen in Section 2.91 If the eigenvalues of D~ 1 B = I — A are 
smaller than 1 in absolute value, the iterative algorithm will converge. 
As an example, consider the matrix describing a two-point sum, 



A 



1 











1 


1 











1 


1 











1 


1 



in the system Ax = y with y = [l 3 5 7] . The eigenvalues of (/ — ^4) are 
all 0, and thus, the algorithm will converge. For example, start with an all-zero 
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vector x^ ' . The iterative procedure ( jl. 189ft produces the following: 



r« 



,( 2 ) 



,(3) 



,W 



and converges in the fourth step {x^ n ' = a;*- 4 ' for n > 5). 

Among iterative solvers of large systems of linear equations, Kaczmarz's algorithm 
has an intuitive geometric interpretation. 

Example 1.63 (Kaczmarz's algorithm) Consider a square system of linear 
equations y = Ax with A real and of full rank. We can look for the solution 
x = \xq X\ ... xat_i] in two ways, concentrating on either the columns or 
rows of A. When concentrating on the columns {vq, «i, • ■ ■ , vn-i}, we see the 
solution x as giving the coefficients to form y as a linear combination of columns: 



JV-l 



/ j Xn^n 



(1.190a) 



When concentrating on the rows {Yq , r\ , . . . , rj 1 _ 1 }, we see the solution x as 
the vector that has all the correct inner products: 

<»,r n )=i/n, n = 0, 1, ...,JV-1. (1.190b) 

Kaczmarz's algorithm uses the row-based view. Normalize r n to unit norm, 
In = r n /\\r n \\- Then (11.190b) becomes 

lir, 



(X, In) 



lkn|| 



!J„ 



n = 0, 1, 



N - 1. 



(1.191) 



The idea of the Kaczmarz's algorithm is to iteratively satisfy the constraints 
( 11.191) . Starting with an initial guess x^~ \ the first update step is N compu- 
tations 



,(«) 



^ l ' 1) +(yn-(^ n ~ 1 \ln)hn, U = 0, 1, 



N-l, 



'1.192) 



called a sweep. With the update (11.192) , x^ n ' satisfies 



(X {n \ 7 „) = (-T ( "- 1} , In) + V'n <7«, In) ~ (in, In) (x^K ln ) 



y' n > 



as desired. At the end of this sweep, x^ N ~ 1 ' will most likely not satisfy (x^ ' , 70) 
y' , and thus, further sweeps are required. 

To understand the algorithm geometrically, note that the update x*- ^ is 
the orthogonal projection of the initial guess an -1 ) onto the affine subspace So 
orthogonal to the subspace spanned by 70 and at the distance y' from the origin 
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Figure 1.33: One step of Kaczmarz's algorithm. The update x*- -* is the orthogonal 
projection of the initial guess a;' -1 ' onto the affine subspace So orthogonal to the subspace 
spanned by 70 and at the distance y' from the origin 0. 



as in Figure d. 331 The desired solution is x = n™ =11 Si. Convergence is geometric 
in the number of sweeps, with a constant depending on how close to orthogonal 
the vectors 7„ are. When the rows of A are orthogonal, convergence is in one 
sweep (see Exercise 1.611 ) . 

In Figure 1.341 we show three different interpretations of solving the system 
of linear equations 

"\/3+l" 



2 





(1.193) 



which has the solution x = [l ll . The rows are already of norm 1, and thus 
In = r n and y' n = y n . Figure [1.34( a) shows the solution as a linear combination 
( 1 1.190a) of column vectors {t>o,vi}- In this particular case, it turns out that 
y = vq + v\ exactly since xq = x\ = 1. Figure 1.34( b) shows the solution as 
the intersection of linear constraints (1.190b) , that is, the intersection of the 
two subspaces Sq and Si, orthogonal to the spans of 70 and 71, and at the 
distance from the origin yo and y\, respectively. The intersection of Sq and Si 
is exactly the solution x = [l ll . Finally, Figure 1.34( c) shows a few steps of 
the iterative algorithm (1.192) , starting with x^~ l > = 0. 

Complexity of Solving a Linear System of Equations The cost of the Gaussian 
elimination algorithm (or any other algorithm) provides an upper bound to the com- 
plexity of solving a general linear system of equations. The precise multiplicative 
complexity has not been proven. 

If the matrix A is structured, computational savings can be achieved. For 
example, we will see in the next chapter that when the matrix is circulant as in 
( 11.227) , the cost is 0(N log 2 N) as in (2.261) , because the discrete Fourier transform 
(DFT) diagonalizes the circulant convolution operator, and many fast algorithms 
for computing the DFT exist. Also, solvers with 0(N 2 ) cost exist for the cases 
where the matrix is Toeplitz as in (1.228) or Vandermonde as in ( 11.230) . Further 
Reading gives pointers to literature on these algorithms. 
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Vi 


f 

fX\V\ 




XqVq Vq 





^^(x ,xi) Si 


J/i "7i 


\S 




\ 



(a) 



(b) 



\ x™ ft 




(c) 



Figure 1.34: Different views of solving the system of linear equations in ( | 1.1931) . (a) As 
a linear combination ( ]1.190a[) of column vectors {vq,Vi}. (b) As the intersection of linear 
constraints ( [1.190b) . (c) As the solution to the iterative algorithm (]1.192|) , starting with 



Appendix 

l.A Elements of Analysis and Topology 

This appendix reviews some basic elements of real analysis (under Lebesgue measure 
as applicable) and the standard topology on the real line. Some material is adapted 
from [ lOOlfl^] . 

l.A.l Basic Definitions 

Sets Let W be a subset of R. An upper bound is a number M such that every w 
in W satisfies w < M . The smallest of all upper bounds is called the supremum 
of W and denoted supW^; if no upper bound exists, supVF = oo. A lower bound 
is a number m such that every w in W satisfies w > m. The largest of all upper 
bounds is called the infimum of W and denoted inf W; if no lower bound exists, 
m£W = — oo. 

The essential supremum and essential infimum are defined similarly but are 
based on bounds that can be violated by a countable number of points. An essential 
upper bound is a number M such that at most a countable number of w in W violates 
w < M. The smallest of all essential upper bounds is called the essential supremum 
of W and denoted ess sup W\ if no essential upper bound exists, ess sup W = oo. 
An essential lower bound is a number m such that at most a countable number of 
w in W violates w > m. The smallest of all essential lower bounds is called the 
essential infimum of W and denoted ess inf W; if no essential lower bound exists, 
ess inf W = — oo. 

Topology Let Ifbea subset of R. An element w G W is an interior point if there 
is an e > such that (w — s, w + e) C W. A set is open if all its points are interior 
points. Facts about open sets include the following: 

(i) R is open, 
(ii) is open. 
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(iii) The union of any collection of open sets is open. 
(iv) The intersection of finitely many open sets is open. 

A set is closed when its complement is open. By complementing the sets in the list 
above, facts about closed sets include the following: 

(i) is closed. 

(ii) K is closed. 

(iii) The intersection of any collection of closed sets is closed, 

(iv) The union of finitely many closed sets is closed. 

The closure of a set W , denoted by W, is the intersection of all closed sets containing 
W . It is also the set of all limit points of convergent sequences in the set. A set 
is closed if and only if it is equal its closure. Also, a set is closed if and only if it 
contains the limit of every convergent sequence in the set. 

Functions A function x takes an argument (input) t and produces a value (output) 
x(t). The acceptable values of the argument form the domain, while the possible 
function values form the range, also called the image. If the range is a subset of a 
larger set, that set is termed the codomain. The notation x : D — * C indicates that 
a; is a function with domain D and codomain C. A composition of functions uses 
the output of one function as the input to another. For example, y(t) = X2{x\{t)) 
will be denoted as y : D — > C\ — — > C<i. A function that maps a vector space into 
a vector space is called an operator. 

A function is injective if x(ti) = xfo) implies that t\ = t%. In other words, 
different values of the function must have been produced by different arguments. A 
function is surjective if the range equals the codomain, that is, if for every y £ C, 
there exists a t £ D, such that x(t) = y. A function is bijective if it is both injective 
and surjective. A bijective function x : D —> C has an inverse x^ 1 : C —> D such 
that x~ 1 (x(t)) = t for all t £ D and x(x (y)) = y for all y £ C. These concepts 
are illustrated in Figure 1.351 

l.A. 2 Convergence 

Sequences A sequence of numbers ao, Oi, • ■ ■ is said to converge to the number a 
(written lim^oo ak = a) when the following holds: 

for any e > there exists a number K £ such that |afc — a| < e for every k > K e . 

The sequence is said to diverge if it does not converge to any (finite) number. It 
diverges to 00 (written Yvnik— ,00 dk = °°) when the following holds: 

for any M there exists a number Km such that ak > M for every k > Km- 
Similarly, it diverges to — 00 (written lim^oo aft = —00) when the following holds: 

for any M there exists a number Km such that ak < M for every k > Km- 
A few properties of convergence of sequences are derived in Exercise 1.631 
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x(t) 



<t) 




(a) x(t) 



t-1, te [-oo,0); 
t + l, i£ [0, oo). 



(b)z(i)=t(f-l)(f + l) 
x(t) 




(c) x(t) = t 3 



(d) x(t) = i 1/3 



Figure 1.35: Examples of different types of functions x : R — * R. (a) Injective, but 
not surjective; the range is 72. = (oo,— 1] U [1, oo). (b) Surjective, but not injective. (c) 
Bijective (both injective and surjective). (d) Inverse of the bijective function from (c). 



Series Let arj, ai , ... be numbers. The numbers s n = 5^fc=o a ki n = 0, 1, . . ., are 
called partial sums of the (infinite) series X^fc=o flfc - The sel "i es is sa id to converge 
when the sequence of partial sums converges. We write 5^fc=o afc = °° wnen the 
partial sums diverge to oo and ^2 k=0 a k = — oo when the partial sums diverge to 
— oo. 

The series ~^2 k=0 afc is said to converge absolutely when ^2 k=0 \a k \ converges. 
A series that converges but does not converge absolutely is said to converge condi- 
tionally. The definition of convergence takes the terms of a series in a particular 
order. When a series is absolutely convergent, its terms can be reordered without 
altering its convergence or its value; otherwise not! 25 ! The doubly-infinite series 
Sfc=-oo ak does not have a single natural choice of partial sums, so it is said to 
converge when it converges absolutely. 

Tests for convergence of series are reviewed in Exercise 1.641 and a few useful 
series are explored in Exercise 1.651 

Functions A sequence of real- valued functions xq, x\, ... converges pointwise when, 
for any fixed t, the sequence of numbers xo(t), Xi(t), ... converges. More explicitly, 
suppose the functions have a common domain D. They converge pointwise to func- 



26 A strange and wonderful fact known as the Riemann series theorem is that a conditionally 
convergent series can be rearranged to converge to any desired value or to diverge! 
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tion x with domain D when, for any e > and t £ D, there exists a number K e j 
(depending on e and t) such that 

\x k (t) - x(t)\ < e for all k > K E<t . 

A more restrictive form of convergence does not allow K e j to depend on t. A 
sequence Xq, X\, ... of real- valued functions on some domain D converges uniformly 
to x : D — ► K when, for any e > 0, there exists a number K e (depending on e) such 
that 

\xk{t) - x(t)\ < e for all t € D and all k > K e . 

Uniform convergence implies pointwise convergence. Furthermore, if a sequence 
of continuous functions is uniformly convergent, the limit function is necessarily 
continuous. 

l.A. 3 Interchange Theorems 

Many derivations in analysis involve interchanging the order of sums, integrals, 
and limits without changing the result. Without appropriate caution, this may be 
simply incorrect; refer again to Footnote [251 

Two nested summations can be seen as a single sum over a two-dimensional 
index set. Thus, since absolute convergence allows rearrangement of the terms in a 
sum, it allows changing the order of summations: 

00 00 00 00 00 00 

/, /J3n,fc| < °° implies /, /, #n,fc = ^^ x n,k- (1.194) 

n=0 fe=0 n=0 fc=0 k=0 n=0 

This extends to doubly- infinite summations and more than two summations. 
The analogous result for integrals is called Fubini's theorem: 

/oo ^00 
/ \x(ti,t2)\ dt 1 dt2 < 00 implies 
-00 J —00 

[•OO />oo 

x(t 1 ,t 2 )dt 1 dt 2 = / x(t 1 ,t 2 )dt 2 dt 1 . (1.195) 



—00 j — 00 j —00 j — 00 



This extends to more than two integrals. 

Interchange of summation and integration can be justified by uniform conver- 
gence. Suppose a sequence of partial sums s n (t) = J^fe=o Xk (*) > 1 = 0, 1, ..., is 
uniformly convergent to s(t) on [a, 6]. Then the series may be integrated term by 
term: 



b 00 00 „{, 

^Zfc(i)cft = X) / x k{t)dt. (1.196) 

fc=0 fe=0 a 



This result extends to infinite intervals as well. 

Uniform convergence is rather restrictive as a justification for ( 1 1.196) since 
changing any of the ij-s on a set of zero measure would not change either side of the 
equality. Another result on interchanging summation and integration is as follows. 
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If |a;fe(f)| < Vk(t) for all k g N and almost all t £ [a, b] and X^fclo Vk(t) converges for 
almost all t £ [a, b] and X^-=o / Vk(t) dt < 00, then (1.196) holds. At the heart of 
this result is an application of the dominated convergence theorem to the sequence 

of partial sums s n (t) = J2k=o x k(t), n = 0, 1, 

The dominated convergence theorem enables interchange of a limit with an 
integral: Let xq, xi, ... be real-valued functions such that limfc_ >0o Xk(t) = x(t) for 
almost all tsR. If there exists a (nonnegative) real- valued function y such that for 
all k E N, 

/oo 
y(t)dt < 00, 
-00 

then x is integrable, and 

/ ( lim x k (t) ) dt = lim / x k (t)dt. (1.197) 

J -co V^oo ) k-+caJ_ QO 

1.A.4 Inequalities 

See [149] for elementary proofs and further details. 



Minkowski's Inequality For any p £ [1, 00), 



Up / \Up / \ Up 

ip I 



/ \ / \ / 

\kez / Vfcez / Vfcez 



(1.198a) 



This establishes that the £ p norm ( 11.36a) satisfies the triangle inequality in Defini- 
tion [L9l Also, for any p £ [1, 00), 

b \ 1/p f r b V /p ( r b \ 1/p 

\x{t) + y{t)Y> dt\ < \x{t)\*dt) +[ \y(t)\Pdt) , (1.198b) 



establishing that the C p norm (1.38a) satisfies the triangle inequality. 
Analogues of ( 1X398] ) hold for £°° and C°° as well: 

sup|x fc + yk\ < sup|x fe | +sup|y fc | (1.199a) 

fcez feez fcez 

and 

sup\x(t) + y(t)\ < sup|x(<)|+sup|y(t)|. (1.199b) 

Holder's Inequality Let p and g in [1, 00] satisfy l/p+ 1/q = 1 with the convention 
that l/oo = is allowed. Then p and q are called Holder conjugates and 

ll*vlli < NWIvll« (1-200) 
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for sequences or functions x and y, with equality if and only if \x\ p and \y\ q are scalar 
multiples of each other. The case of p = q = 2 is the Cauchy-Schwarz inequality 

Specializing ( 11.200) to sequences gives 

X>*n El^l 9 ( L201a ) 

fcez / Vfcez / 

for finite p and g, and 

El^^l ^ (sup|xfc|j (Eltffcl) (1.201b) 

for p = 00. Similarly, for functions 

/oo / /<oo \ 1/p / />oo \ 1/9 

\x{t)y{t)\dt < / |x(t)| p dt) / ls/(i)N* (1.202a) 

-OO \J — 00 / \V — OO / 

for finite p and g, and 



|z(t) !/(*)! <** < sup |a:(t)| / |y(t)|dt (1.202b) 



for p = 00. 



1.A.5 Integration by Parts 

Integration by parts transforms an integral into another integral, possibly easier to 
solve. It can be written very compactly as 



udv = uv — v du (1.203a) 



or more explicitly as 

[•b pb 

/ u(t)v'(t)dt = u(<)«(t)|*^- / v(t)u(t)'dt. (1.203b) 

l.B Elements of Linear Algebra 

This appendix reviews basic concepts in linear algebra. Good sources for more 
details include [76[ 140] . Contrary to the standard convention in finite-dimensional 
linear algebra, we start all indexing at rather than 1; this facilitates consistency 
throughout the book. 
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l.B.l Basic Definitions and Properties 

We say a matrix A is M x N or in C MxN when it has M rows and N columns. 
It is a linear operator mapping C into C . When M = N, the matrix is called 
square; otherwise it is rectangular\ 26 \ An M x 1 matrix is called a column vector, 
a 1 x N matrix a row vector, and a 1 x 1 matrix a scalar. Unless stated explicitly 
otherwise, A m _ n denotes the row-m, column-n entry of matrix A. 

Basic Operations Addition of matrices is element-by-element, so matrices can 
only be added if they have the same dimensions. The product of A G C 1,xP and 
B G c<2 xW is defined only when P = Q, in which case it is given by 



p—i 



m = 0, 1, ..., M - 1, 



(AB) m , n = J2 A ^B k , n , n = 01 'iV-1 (L204) 

fc=0 

Entries A m ^ n are on the (main) diagonal if m = n. A square matrix with unit 
diagonal entries and zero off-diagonal entries is called an identity matrix and denoted 
by /. It is the identity element under matrix multiplication. For square matrices A 
and B, if AB = I and BA = I, B is called the inverse of A and is written as A -1 . If 
no such B exists, A is called singular. When the inverses exist, (A B)^ 1 = B^ 1 A -1 . 
Rectangular matrices do not have inverses. Instead, a short matrix can have a right 
inverse B, so AB = I; similarly, a tall matrix can have a left inverse B, so BA = I. 

If ^m,n = B n;m for all m and n, we write A = B T ; we call B the transpose 
of A. If A m ^ n = B* n m for all m and n, we write A = B*; we call B the Hermitian 
transpose of A. Here, * denotes both complex conjugation of a scalar and the 
combination of complex conjugation and transposition of a matrix. In general, 
(AB) T = B T A T and {AB)* = B* A* . 

Determinant The determinant maps a square matrix into a scalar value. It is 
defined recursively, with det(a) = a for a scalar a and 

JV-l N-l 

det(A) = ^(-l) i+fc det(M i;fc )A a = J2 Ci ' kAi ' k 

k=0 k=0 

for A € C NxN , where minor M^k is the (TV — 1) x (N — 1) matrix obtained by 
deleting the ith row and fcth column of A; cofactor C^k = (— l) 4+fc det(Afjj) will be 
used later to define the adjugate. This definition is valid because the same result 
is obtained for any choice of % G {1, 2, . . . , TV}; to simplify computations, one may 
choose i to minimize the number of nonzero terms in the sum. 

The determinant of A G £^NxN j^g seve ral useful properties, including: 

(i) For a scalar a, det(aA) = a N det(A). 
(ii) If B is obtained by interchanging two rows or two columns of A, then det(-B) = 

-det(A). 



3 We sometimes call a matrix with M > TV tall; similarly, we call a matrix with M < N short. 
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(iii) det{A T ) =det(A). 
(iv) det(AB) = det{A)det(B). 
(v) If A is triangular, that is, all of its elements above or below the main diagonal 

are 0, det(A) is the product of the diagonal elements of A. 
(vi) A is singular if and only if det(A) = 0. 

The final property relating the determinant to invertibility has both a geometric 
interpretation and a connection to a formula for a matrix inverse: 

(i) When the matrix is real, the determinant is the volume of the parallelepiped 
that has the column vectors of the matrix as edges. Thus, a zero deter- 
minant indicates linear dependence of the columns of the matrix, since the 
parallelepiped is not of full dimension. (The row vectors lead to a different 
parallelepiped with the same volume.) 

(ii) The inverse of a nonsingular matrix A is given by Cramer's formula: 

A-i - ad J( A ) (12051 

A ~ det(A)' (1 - 2 ° 5) 

where the adjugate of A is the transpose of the matrix of cofactors of A: 
(ad](A))i^ = Ck,i- Cramer's formula is useful for finding inverses of small 
matrices by hand and as an analytical tool; it does not yield computationally- 
efficient techniques for inversion. 

Range, Null Space, and Rank Associated with any matrix A s t Mxff are four 
fundamental subspaces. The range or column space of A is the span of the columns 
of A and thus a subspace of R M ; it can be written as 

71(A) = span({ao, 01, . . ., a^^i}) = {y e K \ V = Ax for some x e M. N }, 

(1.206a) 
where qq, 01, . . . , ffljv-i are the columns of A. Linear combinations of rows of A 
are all row vectors y T A where y € K M . Taking these as column vectors gives the 
row space of A, which is the range of A T and a subspace of M. N : 

K{A T ) = spm({bo, 6f, ..., &aLi» = {x eR N \ x = A T y ioi some y eR M }, 

(1.206b) 
where &o 1 b±, . . . , &m-i are the rows of A. The null space or kernel of A is the set 
of vectors that A maps to (a subspace of R ): 

M(A) = {xeR N \Ax = 0}. (1.206c) 

The left null space is the set of vectors mapped to zero when multiplied on the right 
by A, taken as column vectors. Since y T A = is equivalent to A T y = 0, the left 
null space of A is the null space of A T (a subspace of R ): 

Af{A T ) = {yeR M I A T y = 0}. (1.206d) 

The four fundamental subspaces provide orthogonal decompositions of l w 
and R A/ as depicted in Figure 1.361 As shown, the null space is the orthogonal 
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R N /\ 


A 




S\ R M 


/ row space ^v 

\ n{A T ) / 


A T 




column space ^v 
11(A) ? 


°X^ 




A 


__— — v\° 


/'null space^v 

\Ar(A) ? 






/ left ^v 

null space y 

\ Af(A T ) / 


K(A T ) ±Af(A) 

dim(7?,(A T )) + dim(A^(A)) = N 




>/ Tl(A)±M(A T ) 
dim(TC(A)) + dim(Af(A T )) = M 



Figure 1.36: The four fundamental subspaces associated with a real matrix A £ R MxJV . 
The matrix determines an orthogonal decomposition of M. N into the row space of A and the 
null space of A; and an orthogonal decomposition of R M into the column space (range) of 
A and the left null space of A. The column and row spaces of A have the same dimension, 
which equals the rank of A. (Figure inspired by the cover of [141] .) 



Space 



Symbol Definition 



Dimension 



Column space (range) 
Left null space 

Row space 

Null space (kernel) 



K(A) {y e 

AT (A*) {y& 

dim(ft(A)) + dim(A/"(A*)) 



| y = Ax for some x £ 
M \A*y = 0} 



M 



n{A") {x £ C N | x = A*y for some y e C M } 

AT (A) {x e C N | Ax = 0} 

dim(Tl(A*)) + dim(Ar(.4)) = N 



rank(A) 

M - rank(A) 

rank(A) 

JV - rank(A) 



Table 1.2 

C MxJV (illustrated in Figure [1.361 for a real matrix A £ 



Summary of spaces and related characteristics for a complex matrix A £ 

MxN\ 



complement of the row space; the left null space is the orthogonal complement of 
the range (column space); A maps the null space to 0; A T maps the left null space 
to 0; and A and A map between the row space and column space, which are of 
equal dimension. Properties of the subspaces are summarized for the complex case 
in Table [Ll 

The rank is defined by 

rank(^) =dim(K(A)). 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



l.B. Elements of Linear Algebra 137 

It satisfies vank(A) = rank(A*) and rank(A B) < min(rank(^4), rank(73)). 

Systems of Linear Equations and Least Squares The product Ax describes a 
linear combination of the columns of A weighted by the entries of x. In solving a 
system of linear equations, 

Ax = y, where AeR MxN , (1.207) 

we encounter the following possibilities depending on whether y belongs to the 
range (column space) of A, y £ 1Z(A), and whether the columns of A are linearly 
independent: 

(i) Unique solution: If y belongs to the range of A and the columns of A are 

linearly independent (rank(^4) = TV"), there is a unique solution, 
(ii) Infinitely many solutions: If y belongs to the range of A and the columns 
of A are not linearly independent (rank(A) < TV"), there are infinitely many 
solutions, 
(iii) No solution: If y does not belong to the range of A, there is no solution. Only 
approximations are possible. 

Cases with and without solutions are unified by looking for a least squares solution 
x, meaning one that minimizes \\y — 2/H2, where y = Ax. This is obtained from the 
orthogonality principle: the error y — y is orthogonal to the range of A, leading to 
the normal equations, 

A T Ax = A T y. (1.208a) 

When A T A is invertible (rank(A) = TV), the unique least squares solution is 

x = (A T A)~ 1 A T y. (1.208b) 

When A is square, the invertibility of A T A implies y G Tt(A) and the least square 
solution simplifies to the exact solution x = A _1 y. 

When A T A is not invertible (rank(A) < TV), the minimization of \\y — Ax\\2 
does not have a unique solution, so we additionally minimize ||S?||2- When AA T is 
invertible (rank(A) = M), this solution is 

x = A T (AA T )~ 1 y. (1.208c) 

The solutions ( ]1.208b| ) and ( ]1.208c[ ) show the two forms of the pseudoinverse of 
A for rank(A) = min(M, TV). Multiplication by the pseudoinverse solves the least 
squares problem for the case of rank(A) < min(M , TV") as well; the pseudoinverse 
is conveniently expressed using the singular value decomposition of A below. Fig- 
ure 1.371 illustrates the discussion. 

Eigenvalues, Eigenvectors, and Spectral Decomposition A number A and nonzero 
vector v are called an eigenvalue and eigenvector of a square matrix A (also, an 
eigenpair) when 

Av = Xv, (1.209) 
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{a) A = 
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0" 
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1_ 




(b)A = 



"1 





1" 


1 
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1 


1 




Figure 1.37: Illustration of solutions to Ax — y in M 3 , with y — [1 1 1] T . (a) Unique 
solution: y £ Ti-{A) and the columns of A are linearly independent. The unique solution 
is x — [1 — 1 1] T . (b) Infinitely many solutions: y £ Ti-{A) and the columns of A are 
not linearly independent. One of the possible solutions is x — [1 1 0] T . (c) No solution: 
y 72.(^4) and the columns of A are linearly independent. The unique approximate solution 
of minimum 2-norm minimizing the error is x — [1 0] ; y — [1 1 0] . 



as seen for general linear operators in ( | 1.531) . The eigenvalues are the roots of the 
characteristic polynomial det(xl — A). When all eigenvalues of A are real, A max (A) 
denotes the largest eigenvalue and A m i n (A) the smallest eigenvalue. Especially when 
the eigenvalues are nonnegative, it is conventional to list them in nonincreasing 
order, A (.4) > Xi(A) >■■■> X N -i(A). 

If an A^ x N matrix A has N linearly independent eigenvectors, then it can 
be written as 

A = VAT/" 1 , (1.210a) 

where A is a diagonal matrix containing the eigenvalues of A along the diagonal and 
V contains the eigenvectors of A as its columns. This is called the spectral theorem. 
Since the eigenvectors form a basis in this case, a vector x can be written as a linear 
combination of eigenvectors x = J2k=o a kVk, and 



'JV-l 



Ax 



•MS 



dkVk 



(«) 



JV-1 



N-l 



Y^^kiAvk) = 5]KA fc H, (1.210b) 



. k=0 



k=0 



k=0 



where (a) follows from the linearity of A; and (b) from ( jl.2091 ). Expressions (|1.210a|) 
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and ( ]1.210b| ) are both diagonalizations. The first shows V~ l AV is a diagonal ma- 
trix; the second shows that expressing the input to operator A using the coordinates 
specified by the eigenvectors of A makes the action of A diagonal. Combining prop- 
erties of the determinant that we have seen earlier with ( ]1.210aj ) , 

JV-l 

det(A) = det(V r Ay- 1 ) = det(yV r " 1 ) det(A) = Q X k . (1.211) 

fc=0 

The conclusion det(A) = rifc=o ^ k holds even for matrices without full sets of 
eigenvectors, as long as eigenvalues are counted with multiplicities. 

The trace is defined for square matrices as the sum of the diagonal entries. 
The trace of a product is invariant to cyclic permutation of the factors, for example 
tr(ABC) = tr(BCA) = tr(CAB). It follows that the trace is invariant to similarity 
transformations: ti:(BAB~ l ) = tv{AB~ l B) = tr(j4). The trace is given by the sum 
of eigenvalues (counted with multiplicities) , 

AT-l 

tr(A) = ^A fc , (1.212) 

k=0 

which is justified by fll.210aj ) for diagonalizable A. 

Singular Value Decomposition Singular value decomposition (SVD) provides a 
diagonalization that applies to any rectangular or square matrix. An M x N real 
or complex matrix A can be factored as follows: 

A = UT,V*, (1.213) 

where U is an M x M unitary matrix, V is an N x N unitary matrix, and £ is an 
M x N matrix with nonnegative real values {0fc}™o ' called singular values 

on the main diagonal and zeros elsewhere. The columns of U are called left singular 
vectors and the columns of V are called right singular vectors. As for eigenvalues, 
o~ma,x(A) denotes the largest singular value and cr m in(^4) the smallest singular value. 
Also as for eigenvalues, it is conventional to list singular values in nonincreasing 
order, <7 max (^4) = o~o(A) > o~i(A) > ••• > o~n-i(A) = o- m - m (A). The number of 
nonzero singular values is the rank of A. The pseudoinverse of A is 

A f = V^U* (1.214) 

where E^ is the N x M matrix with l/cr*, in the (k, k) position for each nonzero 
singular value and zeros elsewhere. 

The following fact relates singular value and eigendecompositions (see also 
Exercise 11.67,1 ) : Using the singular value decomposition ( 11.213J ) , 

AA* = {UYy*)(VY?U*) = UZ 2 U* 7 
A* A = (VZ*U*)(UZV*) = VY?V\ 

so the squares of the singular values of A are the nonzero eigenvalues of AA* and 
A* A, that is, 

o- 2 {A) = \{AA*) = X(A*A), for A ^ 0. (1.215) 
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Matrix Norms Norms on matrices must satisfy the conditions in Definition 11.91 
Many commonly-used norms on M x N matrices are operator norms induced by 
norms on M- and ./V-dimensional vectors, as in Definition 1.181 Using the vector 
norms defined in Q1.35aj ) and ( ]1.35b[ ), 

\\A\\ M = sup ||Ar|| g , (1.216) 

IMIp=i 

and ||^4|| p ,p is denoted ||-A|| p . A few of these simplify as follows: 
Mill = \\A\\ 1A = max E^o'lAil. 

0<j<N—l 



\\A\\ 1>2 = max (E'-o 1 !^! 2 

0<j<N-l V l_u ' J 

|-4|| 1,00 = max \Aij\ 

0<i<M-l,0<j<N-l J 



A\\ 2 = \\A\\ 2 . 2 = a max (A) = ^/\^(A*A), 

1/2 

i X -'''\,' 1,1; ,i-i 

0<i<M-l 



|A|| 2 ,oo : = „ x max 1 (E^LoMAi 



\A\\oc = ll-^lloo.oo = max E,--n \ A i-j\ 



The most common matrix norm that is not an operator norm is the Frobenius 



M-1N-1 

v E E I A, I 2 = Vto(AA*). (1-217) 

1.B.2 Special Matrices 

Unitary and Orthogonal Matrices A square matrix U is called unitary when it 
satisfies 

U*U = UU* = I. (1.218) 

Its inverse U~ x equals its Hermitian transpose U*. A real unitary matrix satisfies 

U T U = UU T = I, (1.219) 

and is called orthogonal\ 27 \ 

Unitary matrices preserve norms for all complex vectors, 

Ul/arll = hi, 

and more generally preserve inner products, 

(Ux, Uy) = (x, y). 

Each eigenvalue of a unitary matrix has unit modulus, and all its eigenvectors 
are orthogonal. Each eigenvalue of an orthogonal matrix is ±1 or part of a complex 
conjugate pair e J . From ( 11.178a) , its condition number is k(U) = 1. 

27 It is sometimes a source of confusion that an orthogonal matrix has orthonormal (not merely 
orthogonal) columns (or rows). 
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Rotations and Rotoinversions From ( 11. 219} ), the determinant of an orthogonal 
matrix satisfies (det(C/)) 2 = 1. When det(C/) = 1, the orthogonal matrix is called a 
rotation] when det(f) = —1, it is called an improper rotation or rotoinversion. In 
K 2 , a rotation is always of the form 



cos( 
sin( 



and a rotoinversion is always of the form 



cos( 
sinC 



- sin a 
cos 6 



sm( 
cos( 



(1.220a) 



(1.220b) 



The rotoinversion can be interpreted as a composition of a rotation and a reflection 
of one coordinate. In R , a rotation can always be written as a product of N(N — 
l)/2 matrices that each performs a planar rotation in one pair of coordinates. For 
example, any rotation in K 3 can be written as 



cos 6*oi — sin #01 

sin #oi cos 0rji 

1 



A general rotoinversion can be written similarly with one planar rotation replaced 
by a planar rotoinversion. 



COS #02 





- sin #02 




1 











1 










COS #12 


- sin #12 


sin #02 





COS #02 







sin #12 


COS #12 



Hermitian, Symmetric, and Normal Matrices A Hermitian matrix is equal to its 
adjoint: 

A = A*. (1.221a) 

Such a matrix must be square and is also called self-adjoint. A real Hermitian 
matrix is equal to its transpose, 



A T , 



and is called symmetric. The 2-norm of a Hermitian matrix is 

|U|| 2 = |A max |. 



(1.221b) 



(1.222) 



All the eigenvalues of a Hermitian matrix are real. All the eigenvectors corre- 
sponding to distinct eigenvalues are orthogonal. A Hermitian matrix can be diago- 
nalized as 

A = UAU*, (1.223a) 

where U is a unitary matrix with eigenvectors of A as columns, and A the diagonal 
matrix of corresponding eigenvalues; this is the spectral theorem for Hermitian 
matrices. For the case of A real (symmetric), U is real (orthogonal); thus, 



UAU J 



(1.223b) 
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Equation ( 11. 223a) further means that any Hermitian matrix can be factored as 
A = QQ* , with Q = f/vA. Its condition number is given in ( jl.l78b[ ) . 

A matrix A is called normal when it satisfies A* A = AA*; in words, it com- 
mutes with its Hermitian transpose. Hermitian matrices are obviously normal. A 
matrix is normal if and only if it can be unitarily diagonalized as in ( 1 1.223a) . 

For a normal (Hermitian) matrix A, for each eigenvalue A& there is a singular 
value <7fc = |Afc|, where the singular values are listed in nonincreasing order, but the 
eigenvalues may not be. 



Positive Definite Matrices A Hermitian matrix A is called positive semidefinite 
when, for all nonzero vectors x, the following is satisfied: 

x*Ax > 0. (1.224) 

This is also written as A > 0. If furthermore ( 11.224) holds with strict inequality, A 
is called positive definite, which is written as A > 0. When a Hermitian matrix A 
has smallest and largest eigenvalues of A m i n and A max , the matrices A ma x^ — A and 
A — X m inl are positive semidefinite: 

AmhJ < A < A max /. (1.225) 

All eigenvalues of a positive definite matrix are positive. For any positive 
definite matrix A, there exists a nonsingular matrix W such that A = W*W, 
where W is a matrix generalization of the square root of A. One possible way to 
choose such a square root is to diagonalize A as 

A = QAQ*, (1.226) 

and, since all the eigenvalues are positive, choose W* = QvA, where the square 
root is applied to each diagonal element of A. 



Circulant Matrices A (right) circulant matrix is a matrix where each row is ob- 
tained by a (right) circular shift of the previous row: 



C 



c C N -i ... C\ 
c\ c ... c 2 

CJV-I Cat_2 • • • Co 



(1.227) 



A circulant matrix is diagonalized by the DFT matrix ( |2.161| ), as we will see in 
( 12.177) . Since the DFT matrix is unitary, all the eigenvectors of a circulant matrix 
are orthonormal. 
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Toeplitz Matrices A Toeplitz T matrix is a matrix whose entry T^ depends only 
on the value of k — i. A Toeplitz matrix is thus constant along diagonals: 



/() 


h 


h 


-1 


/n 


h 


-2 


t-i 


to 



tN-l 
t-N-2 
tN-3 



t-N+1 t-N+2 t-N+3 ' ' ' ^0 

A matrix in which blocks follow the form above is called block Toeplitz. 



(1.228) 



Band Matrices A band or banded matrix is a square matrix with nonzero entries 
only in a "band" around the main diagonal. The band need not be symmetric; 
there may be N r occupied diagonals on the right side and Ni on the left side. For 
example, a 5 x 5 matrix with N r = 2 and Ni = 1 is of the form: 



B 



!>oo &oi ^02 

b 10 6u 6 12 &13 

621 b 2 2 b 23 b 24: 

6 32 633 634 

643 6 44 



'1.229) 



Many sets of special matrices are subsets of the band matrices. For example, diag- 
onal matrices have N r = Ni = 0, tridiagonal have N r = Ni = 1, upper-triangular 
have Ni = 0, and lower-triangular have N r = 0. 

Square matrices have a well-defined "main antidiagonal" running from lower- 
left corner to upper-right corner. An antidiagonal matrix has nonzero entries only 
in the main antidiagonal. A useful matrix is the unit antidiagonal matrix, which 
has Is in the main antidiagonal. 



Vandermonde Matrices A Vandermonde matrix is a matrix of the form: 



V 



Of) 



«? 



N-V 
*0 
N-l 



.1 Oi.M-1 a M-l ■ 

When M = N, the determinant of the matrix is 



a 



N-l 

M-l. 



(1.230) 



det(^) = Yl ( a i~ a i)- 

0<i<j<N-l 



(1.231) 



Many useful concepts in sequence processing use Vandermonde matrices, such as 
the DFT matrix introduced in ( |2.161a| ) . 
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l.C Elements of Probability 

This appendix reviews basic concepts in the theory of probability, with an emphasis 
on continuous random variables. See [IT] for a thorough but elementary introduction 
or [65] for an introduction with more mathematical sophistication. 

l.C.l Basic Definitions 

Probabilistic Models A probability law P (■) assigns probabilities to events, which 
are subsets of the outcomes of an experiment. The set of all outcomes is called the 
sample space and denoted £1. A probability law satisfies the following axioms: 

(i) Nonnegativity. P (A) > for every event A. 
(ii) Additivity. If A and B are disjoint events, then P (Au B) = P (A) + P (B); 

this additivity extends to countable unions of disjoint events, 
(iii) Normalization. P (Q) = 1. 



as 



The conditional probability of event A, given event B with P (B) > 0, is defined 
P(A\B) - P ( Anfl > 



P(B) 



Conditioning on B is a restriction of the sample space to B 7 with rescaling of 
probabilities such that P (• | B) satisfies the normalization axiom and thus is a 
probability law. If events A and B both have positive probability, then writing 
P {A n B) = P (A I B) P (B) and P (A f~l B) = P (B | A) P {A) yields Bayes' Rule: 

, , , P(B\A)P(A) 

Events A and B are called independent when P (A n B) = P (A) P (i3). 

Continuous Random Variables A real, continuous random variable x has a prob- 
ability density function (PDF) f K defined on the real line such that 

P(xg j 4) = / f x (t)dt (1.232a) 

J A 

is the probability that x falls in the set A C Kj 28 ] The cumulative distribution 
function (CDF) of x is 

F x (t) = P(x<£) = f Us)ds. (1.232b) 



28 Formally. x : Q — > R, and {w £ SI | x(u>) 6 A} must be an event. There are technical subtleties 
in the functions f x and sets A that should be allowed. It is adequate to assume that f x has a 
countable number of discontinuities and that A is a countable union of intervals [65]. We refer the 
reader to the footnote on page [23] for our philosophy on this type of mathematical technicality. 
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Since probabilities are nonnegative, we must have / x (i) > for all t 6 1. 
Since x takes some real value, 



/*(*) dt 



Elementary properties of the CDF include 



lim F x (t) = and lim F x (t) = 1, 

t — ►— 00 t — >oo 

and that 

—F K (t) = f x (t) where the derivative exists. 

By calling / x a function, we are excluding Dirac delta components from / x ; 
the CDF F x is then continuous because it is the integral of the PDF. Allowing 
Dirac components in / x would introduce jumps in the CDF. This is necessary for 
describing discrete or mixed random variables. 



Expectation, Moments, and Variance The expectation of a function g(x) is de- 
fined as 

/oo 
g{t)Ut)dt. (1.233a) 

-00 

In particular, E[x fc ] is called the kth moment. The zeroth moment must be 1, and 
other moments do not always exist. The first moment is called the mean, and the 
variance is obtained from the first and second moments: 

var(x) = E[(x-E[x]) 2 ] = E[x 2 ] - (E[x]) 2 . (1.233b) 

Variance is nonnegative. The expectation is linear in that 

E[a (7 (x)+a l5l (x)] = a E[ 5o (x) ] + ai E[ 5l (x) ] (1.233c) 

for any constants an. and ol\ and any functions <?o and g%. From this it follows that 

var(ao x + ai) = a o var(x) (1.233d) 

for any constants a$ and ol\. 

Random variables x and y are said to have the same distribution when E[ g(x) ] = 
E[g(y) ] for any function g. This requires their CDFs to be equal, though their PDFs 
may differ at a countable number of points! 29 ! 



29 This is analogous to equality in C p for 1 < p < oo: equality of CDFs F x and F Y implies 
ll/x - fvWcv = for any p e [1, oo). 
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Jointly-Distributed Random Variables Real, continuous random variables x and 
y have a joint PDF f x<y denned such that 



P((x,y)eA) = // fx,y(8,t)dtda (1.234a) 

is the probability that (x, y) falls in A C M 2 and a joint CDF 

F x>y (s,t) = P(x<s, y < t) = / / f Ki y{u,v)dvdu. (1.234b) 

•J —CO J —CO 

The marginal PDF of x is 

/*(*) = / /x, y (M)*, (1.234c) 



and the marginal CDF of x is 

F x (s) = lim F x Js,t). (1.234d) 

The conditional PDF of x given y is defined as 

/x|y(«l*) = %fe^ for t such that / y (t) ^ 0. (1.234e) 

/yW 

The conditional expectation is defined with the conditional PDF: 

/oo 
5(a) / X | y (s 1 1) ds. (1.234f) 

-OO 

When / Xj y is separable as 

/x,y(M) = /x(«)/y(t) (l-234g) 

for PDFs / x and / y , the random variables x and y are called independent. An 
immediate ramification of independence is that f K \y( s I t) = fx(s) for every t such 
that fy(t) 7^ 0. These definitions extend to any number of random variables, with 
some subtleties for infinite collections. 

A complex random variable has real and imaginary parts that are jointly 
distributed real random variables. A random vector has components that are 
jointly distributed scalar random variables. The mean of an A-dimensional ran- 
dom vector x is a vector fi = E[x] G M. N . The covariance matrix is defined as 
E[(x- M )(x- M ) T ]. 

l.C. 2 Standard Distributions 

Uniform Random Variables For any real numbers a and b with a < b, a random 
variable with PDF 

/x(t) = ( 1/(b ~ a) ' ^^M (l.235a) 

J v J 0, otherwise v ; 
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is called uniform on [a, b\. This is denoted x ~ U(a, b). Simple computations yield 

0, for t < a; 
F x (t) = { (t-a)/(b-a), forte [a, b]; (1.235b) 

1, for t > b, 

E[x] = (a + b)/2, and var(x) = (b - a) 2 /12. 

Gaussian Random Variables and Vectors For any real \i and positive a, a random 
variable with PDF 

f x (t) = -J^e-K'-M) 2 /- 2 (1.236) 

VZTTCT 



is called Gaussian or normal with mean jj, and variance a 2 . This is denoted x ~ 
Af(n, o~ 2 ). When fj, = and a = 1, the random variable is called standard. There is 
no elementary expression for the CDF of a Gaussian random variable. 

For any \x £ M. N and symmetric, positive definite E G S> NxN , a random vector 
x = [xq, xi, ... ,xtv-i] t with joint PDF 



U[) ~ (27r)^/ 2 (det(E))i/2 e ^ 261 > 

is called (jointly) Gaussian or multivariate normal with mean fj, and covariance E. 
This is denoted x ~ A/"(/i, E). 

Gaussianity is invariant to affine transformations: when x is jointly Gaussian, 
Ax + b is also jointly Gaussian for any constant matrix A € ^ MxN f rank M and 
constant vector b G M. M \ so \ the new mean is Afj, + b and the new covariance matrix 
is AT,A T . 

The marginal distributions and conditional distributions are jointly Gaussian 
also. Partition x, fj,, and E (in a dimensionally-compatible manner) as 



and 



Ey Ey^ 

^z,y ^z 



The symmetry of E implies E y = E^ , E z = E^, and E YiZ = E^ . The marginal 
distribution of y is jointly Gaussian with mean fi y and covariance E y , and the condi- 
tional distribution of y given z = t is jointly Gaussian with mean /i y -|-Ey iZ E~ 1 (i— /i z ) 
and covariance E y — Ey ]Z E^T S Z] y. These and other properties of jointly Gaussian 
vectors are developed in Exercise 11.701 

Gaussian random variables are very common in modeling physical phenomena 
because they arise from the accumulation of a large number of small, independent, 
random effects. This is made precise by the central limit theorem, a simple version 
of which is as follows: 



30 Some authors require the covariance matrix of a jointly Gaussian vector to be positive semidef- 
inite rather than positive definite. In this case, the rank condition on A can be removed, and the 
PDF does not necessarily exist. 
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Let xi, X2, ... be independent, identically-distributed (i.i.d.) random 
variables with mean fi and variance a 2 . For each n £ Z + , define a 
shifted and scaled version of the sample mean of {xfc}jj =1 : 



(1.238a) 



These random variables converge in distribution to a standard normal 
random variable z: 

lim F Zn (t) = F z {t) for all* eR. (1.238b) 

n — >oo 

Similar results hold under conditions that allow weak dependence of the variables. 

l.C. 3 Estimation 

Estimation is the process of forming estimates of parameters of interest from ob- 
servations that are probabilistically related to the parameters. Bayesian and non- 
Bayesian (classical) techniques are distinguished by whether the parameters are 
considered to be random variables; observations are random in either case. For 
simplicity, generalities below are stated for a continuous scalar parameter and con- 
tinuous scalar observations. Some examples demonstrate extensions to vectors. 

Bayesian Estimation In Bayesian estimation, the parameter of interest is assumed 
to be a random variable x. Its distribution is called the prior distribution to empha- 
size that it describes x without use of observations. The conditional distribution 
of the observation y, given the parameter x, follows a distribution / y i x called the 
likelihood. After observing y = t, Bayes' Rule specifies the posterior distribution of 
x to be 

/x(s)/ y |x(* I s) /x(s)/ y | x (f I s) 



/x|yOI0 



/y(<) /_ oo /x00/y|*(t|*)d* 



Bayesian estimators are derived by using the posterior distribution to optimize a 
criterion of interest to find the best function x = <7(y). 

A common performance criterion is the mean-squared error (MSE) E[ (x — x) 2 ]. 
In the trivial case of having no observation available, x is simply a constant c. The 
MSE is minimized by c = E[x]. This is verified through the following expansion: 

E[(x-c) 2 ] = var(x-c) + (E[x-c]) 2 ( =' var(x) + (E[x] - c) 2 , 

where (a) uses ( |1.233bj ); and (b) uses ( 1 1.233c) and ( ] 1.233d] ). When y = t has been 
observed, x is conditionally distributed as / x |{y=t} an d the MSE is minimized by 

xmmse(^) = E[x|y = t\. 
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Example 1.64 (Bayesian MMSE estimation: Gaussian case) Let x and 
y be jointly Gaussian vectors, meaning that their concatenation [x y ] has 
PDF of the form ( 1 1 .23T[ ) . Assume E[x] = fi x and E[y] = fi y , and write the 
covariance as 

E 

where £ x is the covariance of x, E y is the covariance of y, and E x . y = E[xy T ] is 
the cross covariance between x and y. 

The conditional PDF of x given y = t is jointly Gaussian with mean /x x + 
S xy E y " 1 (i — fiy) and covariance E x — E X!y E y -'-Ey x (see Appendix 1 1 . C . 2 j) . Since 
the conditional mean minimizes MSE, we have 

xmmseW = A*x + Sx.ySy 1 ^- fiy). (1.239) 

Its resulting MSE is 

E[||x-x M mse|| 2 ] = tr (E x - E^yE-^^) . 

Note that the same optimal estimator arises for general (non-Gaussian) distri- 
butions when the estimator is restricted to be linear; see Exercise 1.341 

As a special case, suppose y = x + z where z ~ A/"(0, o\T) and x and 
z are independent. We say that y is an observation of x with additive white 
Gaussian noise (AWGN). Then fi y = fi x , E Xiy = E[x(x + z) T ] = E x , and E y = 
E[ (x + z)(x + z) T ] = S x + a^I. The optimal estimator from observation y = y 
and its performance simplify to 

xmmse(0 = Mx + E x (E x + cr^J)~ 1 (i- fi x ), 

E[||x-x M mse|| 2 ] = tr(E x -E x (E x + CT z 2 /)-i Sx ). 

Specializing further, suppose x is scalar with mean zero and variance ct x . Then 

2 
xmmsew = —5—, — ~t 

and 

E[||x-x M mse|| 2 1 = <jI (l 



Classical Estimation In classical estimation, the parameter of interest is treated as 
an unknown non-random quantity. The observation y is random, with a likelihood 
(distribution) f y;x that depends on the parameter xc}\ An estimator x(y) produces 
an estimate of x from the observation. Since it is a function of the random variable 
y, an estimate is also a random variable with a distribution that depends on x. The 
dependence on x is emphasized with a subscript in the following. 



31 Some authors write this as f y \ x , with the potential to confuse random and non-random quan- 
tities. 
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The error of an estimator is x(y) — x, and the bias is the expected error: 
M*(y)) = E z[x(y)-x] = E x [x(y)]-x. 

An unbiased estimator has 6 x (x(y)) = for all x. The mean-squared error of an 
estimator is 

E x [|x(y)-a:| 2 ]. 

It depends on x, and it can be expanded as the sum of the variance and the square 
of the bias: 

E4|x(y)-x| 2 ] = var x (x(y)) + (6 x (x(y))) 2 . 

Sometimes attention is limited to unbiased estimators, in which case the MSE is min- 
imized by minimizing the variance of the estimator. This results in the minimum- 
variance unbiased estimator (MVUE). 

Example 1.65 (Classical MMSE estimation: Gaussian case) Let x e R N 
be a parameter of interest, and let y = Ax + z where A G M. MxN is a known 
matrix and z ~ Af(0, E). Since z has zero mean, the estimator x(y) = By is 
unbiased whenever BA = I. Assuming BA = I, the MSE of the estimator is 

E4||B(Ar + z)-x|| 2 ] = E4||Bz|| 2 ] = tr(BEB T ). 

Since this MSE does not depend oni, it can be minimized through the choice of B 
to yield a valid estimator. The MSE is minimized by B = {A T Yi^ 1 A)^ 1 A T Yi~ l . 
The resulting MSE is tv{{A T T,- 1 A)- 1 ). 

Note that A T Y>~ l A must be invertible for the estimator above to exist. If 
rank(A) < N, it is hopeless to form an estimate of x without prior information; 
the component of x in the null space of A is unobserved. 

As a special case, suppose E = cr 2 /. Then the optimal estimator simplifies 
to B = (A T A)~ 1 A T , the pseudoinverse of A. 
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Chapter at a Glance 

In this chapter, our goal was to find representations given by linear operators such that: 

X = <T>$*x. 



After finding $ and $ such that $$* = /, we call 



<f>*x, 



<f>a = <£><I>*:r, 



a decomposition and a reconstruction, respectively. Also, $a is often called a representation 
of a signal. The elements of a are called expansion or transform coefficients and include 
Fourier, wavelet, Gabor coefficients, as well as many others. We decompose signals to look 
into their properties in the transform domain. After analysis or manipulations such as 
compression, transmission, etc., we reconstruct the signal from its expansion coefficients. 
We studied cases distinguished by the properties of <f>; in finite dimensions, 

(i) If <& is square and nonsingular, then $ is a basis and <& is its dual basis. 

(ii) If $ is unitary, that is, 3>$* = /, then $ is an orthonormal basis and $ = $. 

(iii) If <& is rectangular and full rank, then <I> is a frame and $ is its dual frame. 

(iv) If $ is rectangular and $$* = / , then $ is a tight frame and $ = $. 

Which of these options we will choose depends entirely on our application and the criteria 
for designing such matrices (representations). In the book, these criteria are based on 
time-frequency considerations; we explore them in more detail in Chapter \6\ 



Signal Representations in Finite-Dimensional Spaces 

Property Orth. basis Biorth. basis Tight frame 



General frame 



Expansion 

set 



* r 1JV-1 

Vk 6C" 

{•Pi, <Pk) = K-h 



Structure 

Expansion 2_. ( x > 'Pk)'Pk 



Matrix 
view 

Norm pres. 



Successive 

approx. 

Redundant 



<3> of size TV X TV 
<I> unitary 

Yes, ||x|| = 

IV-l 

£ ifo**>i a 

fc = 

Yes, x (k) = 
x (fc_1) + (x, <p k )<p h 

No 



* r 1JV-1 

* = {<fk} k=0 

s: r~ i JV-l 

* =j>*Jfc=0 

fk,fk e <c N 

{•Pi, <Pk) = <5j-fe 

N-l 
£ ( X < ^>k)fk 

k=a 

<3> of size TV X TV 

$ full rank TV 

$5* = 1, I = (<E>*) _1 

No 



No 
No 



^ r 1 M — 1 
® = {Vk} k=0 

tp k e C N ,M > TV 
None 

M-l 

£ ( x < Vk)<fik 

k=0 

<E> of size TV X M 
rows of <£> orthogonal 
*$* = I 

Yes, ||x|| = 
M-l 

~£l<*,¥*>| a 
A k=o 

No 
Yes 



* : 

$ : 



{<M£ 



iM-1 
Jfe=0 
r~ iM-1 

<Pk,Vk ££ N ,M>N 

None 

M-l 

£ (x, Vk)<Pk 

k=0 

* of size TV x M 
$ full rank TV 
$$* =/ 
No 



No 



Yes 



Table 1.3: Signal representations in finite-dimensional spaces. 
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Chapter 1. From Euclid to Hilbert 




Historical Remarks 

The choice of the title for this chapter requires at least some acknowledgment of the two 
mathematical giants figuring in it: Euclid and Hilbert. 

Little is known about Euclid (c. 300 BC) apart from his writings. 
He was a Greek mathematician who lived and worked in Alexandria, 
Egypt. His book Elements [68] (in fact, 13 books), remains the most 
successful textbook in the history of mathematics. In it, he introduces 
and discusses many topics, most of which have taken hold in our 
consciousness as immutable truths, such as the principles of Euclidean 
geometry. He also has numerous results in number theory, including 
a simple proof that there are infinitely many prime numbers and a 
procedure for finding the greatest common divisor of two numbers. 
Moreover, it was Euclid who introduced the axiomatic method upon 
which all mathematical knowledge today is based. 

David Hilbert (1862—1943) was a German mathematician, known 
for an axiomatization of geometry supplanting Euclid's five original 
axioms. He was one of the most universal mathematicians of all 
time, contributing towards knowledge in functional analysis, num- 
ber theory and physics, among others. His efforts towards banish- 
ing theoretical uncertainties in mathematics ended in failure [174]: 
"Godel demonstrated that any non-contradictory formal system, 
which was comprehensive enough to include at least arithmetic, 
cannot demonstrate its completeness by way of its own axioms." 
At the turn of the 20th century, he produced a list of 23 unsolved 
problems, generally thought to be the most thoughtful and compre- 
hensive such list ever. He worked closely with another famous mathematician, Minkowski, 
and had as students or assistants such illustrious names as Weyl, von Neumann, Courant 
and many others. He taught all his life, first at the University of Konigsberg and then at 
the University of GSttingen, where he died in 1943. On his tombstone, one of his famous 
sayings is inscribed: 

Wir miissen wissen. Wir werden wissen\ | 




Further Reading 

Books and Textbooks Below is a sample list of books/textbooks in which more infor- 
mation can be found about various topics we discussed in this chapter. Some of them are 
standard in the signal processing community and others we have used while writing this 
book. 

Standard books on probability are by Bertsekas and Tsitsiklis [11] and Papoulis [110]. 

Many good reference texts exist on linear algebra, for example, Gantmacher [56] and 
Strang [139]. Good reviews are also provided by Kailath in [83] and Vaidyanathan [158]. 
Books by Kreyszig [93], Luenberger [98], Gohberg and Goldberg [59], and Young [176] 
provide details on abstract vector spaces. In particular, parts of our proof of the projection 
theorem, Theorem 11.261 follow [98] closely. Daubechies in [41] discusses Riesz bases, while 
more on frames can be found in [291 190] . 

Parametrization of unitary matrices in various forms, such as using Givens rotations 



32 We must know. We will know. 
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or Householder building blocks are given in [158j . 



Exercises with Solutions 

1.1. Vector Space C N 
Prove that: 

(i) C is a vector space. 

(ii) The finite cross product of vector spaces, V = Vq X Vi X • • • X Vjv-i, where 
Vb, Vi, ■ . . , Vm—i are each a vector space, is a vector space as well. The finite 
cross product is defined as the set of sequences x = [xq x\ ■ ■ ■ Xjv— ll where 
x e Vb, ■••, xn-1 e V N -i- 
Solution: 

(i) To prove that C N is a vector space, we need to check that the conditions stated in 
Definition 1 1 . 1 1 are satisfied. We prove the following for any x,y and z in C . While 
these are rather trivial, we go through the details once. 

1. Commutativity: 

x + y=[x Xi ■■■ Xfj-i] + [ya Vi ••• Vn-i] 

= [xo + yo xi+yi ■■■ xn-i + VN-i] 

= [yo+x yi+X! ■■■ y N -i +x N _ 1 ] 

= [yo 2/1 ••• yN-i] T + [xo xi ■■■ a;jv-i] T = y + x. 

2. Associativity: 

(x + y) + z = [x +yo xi + yi ■■■ xat_i + j/iv-i] + [zo zi ••• Z]v-i] 
= [{xo+yo) + zo {xi+yi) + zi ■■■ (xjv-i + j/n-i) + zjv-i] 
= [xo + [yo + zo) xi + (yi+zi) ■■■ zjv-i + (j/v-i + zjv-i)] 
= [xa xi ■■■ xjv-i] + [yo + z yi+zi ■■■ vn-i+zn-i] 
= x + (y + z), 
and 

(a/3)x = [{a/3)x (aP)xi ■■■ (a[})x N _ 1 ] T 

= [a(f3x ) a(f}xi) ■■■ a(/3x N -i)] = a(0x). 

3. Distributivity : Follows similarly to the above two. 

4. Additive identity: The element = [0 • • • 0] 6 C N is unique, since 
all its components (0 £ C) are unique, and 

x + 0=[x +0 xi +0 ■■■ xjv_i+0] T 

= [0 + x + zi ■■■ + a;jv_i] T = + x 

= [x X\ ■ ■■ Xjy-l] = x. 

5. Additive inverse: The element (—x) = [—xo —xi ■■■ — Xjv-i] £ C N is 
unique, since (— Xi) for i = 0, 1, . . . , N — 1 are unique in C, and 

x + (-x) = [x + (-x ) X! + (-X!) ■■■ Xjv-i + (-x N _ 1 )] 

= [(-X )+X (-X!) + X! ■■■ (~X N _ l ) +X N _i] 

= {-x) + x = [0 ■■■ 0] T = 0. 
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6. Multiplicative identity: Follows similarly to additive identity. 

(ii) Thus C^ is a vector space. Note that all the arguments above rely on the fact that 
C itself is a vector space, and thus addition and scalar multiplication satisfy all the 
necessary properties. The finite cross product of vector spaces is a vector space. That 
is, let Vo, VI,..., Vn-i be N vector spaces. We denote by V = Vo X Vi X • •• X Vjv_i 
the set of sequences x = [xq x\ ■■■ xjv-i] where xo £ Vb, . . ., iejv-i £ 
Vjv-i- Then V is a vector space with the following operations: 

x + y = [x ®v yo ■■■ Zjv-1 ©Vjv_i 2Mr-l] 



T 



where "@y." is the addition in Vi for i = 0, 1, . . . , N — 1, and 

ax = [o0v o io ■•■ ocQv N _! SJV-i] 
where "©y . " is the scalar multiplication in Vi for i = 0, 1, . . . , N — 1. 

1.2. Vector Space gP(Z) 

Show that for any p £ [1, oo), the vectors in C z with finite £ P (I<) norm form a vector space. 
(Hint: Use Minkowski's inequality (|1.198a[ |.) 

Solution: Since £ P (I) is a subset of C z , we are using the vector addition and scalar 
multiplication operations of C z . Thus, we need not check commutativity, associativity, 
distributivity, and multiplicative identity properties. The additive identity £ C z has £ p 
norm and thus is in the subset under consideration. For any x £ C z with finite £ p norm, 
—x also has finite £ p norm, so the subset under consideration has the additive inverse 
property. 

What remains is to show that, if x, y £ £ P (Z) and a £ C, then (i) ax £ £ p (li); and 
(ii) x + y £ £ p (Z). To check (i), note that 

OO OO 

\\ax\\ p = J2 \ ax "\ p = H p J2 W = M P IMI? < °°- 

n= — oc 71= — oo 

Property (ii) follows immediately from Minkowski's inequality. 

1.3. Incompleteness of C ([0,1]) 

Consider the sequence of functions {x/ c }i e > 2 on [0, 1] defined by 



0. 




*e[o,i-i); 


k(t- 


"*) + !. 


te[|,i]. 


1, 





Xfc(t) = 

(i) Sketch the sequence of functions. 

(ii) Show that Sj. is a Cauchy sequence under the C 2 norm. 

(iii) Show that xj. — > / under the C 2 norm for a discontinuous function /. Since / 
C([0, 1]), this shows that C([0, 1]) is not complete. 

Solution: 



(i) Figure |E 1 . 3- 1 1 shows the plots for functions 3:3,210, and £30. 
(ii) (x n ) n >2 is a Cauchy sequence under the C 2 norm, if for any e > there exists an 

integer ./Vo > 0, such that for any integers m,n > TVo the norm of the difference of 

x„ and x m is bounded by e: \\xm — x n \\2 < e. 

Assume m > n and consider the function x m — x n : 

f 0, te[0,i-i)U[i, 1]; 

x m (t) - x n (t) = -n(t - I) - 1, t £ [I - ±, I - 1); 
[ ( m -n)(t-|), t6[|-i, i]. 

Since our sequence starts at n = 2, for any e > we define Nq = max (2, f «-y]). 
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-^t 



0.2 0.4 0.6 0.8 1 

(a) X 3 (t) 



^-t 



0.2 0.4 0.6 0.8 1 

(b) Kio(t) 



-i 



0.2 0.4 0.6 0.8 1 

(c) xso(t) 



Figure El. 3-1: Example functions Xk(t). 



Then for any m> n> No the following holds: 



L m x n || 2 



IU 



%m 2-n ) Ctt 



,i i 



jl ™ (»(* " D + !) 2 dt + jl ^ ( m " »)"(* " I)' 



c/f 



(l-^) 3 , (™ + «) 2 



3ri 



3m 3 



(m + n) 2 (m + m) 2 



4 4 a 

— < < e 2 

3n _ 3JV ~ 



3m 2 n 3m 2 n 

Hence, \\x m — Xn\\a 5: e i anc l by definition, (a; n ) is a Cauchy sequence under the C 2 
norm. 



lim x„(t) = f(t) 



(iii) 

0, t£[0,i); 

i, *e[|,i]. 

Since f(t) has a discontinuity at t = ^, / <£ C([0, 1]). 

1.4. Orthogonal Projection to Span of an Orthonormal Set 
Let {</3fc}feei be an orthonormal set. Prove that 

Px = ^(x, y?fc>¥>k 
feel 

is an orthogonal projection operator onto spaxi({<Pk}k^x)- 

Solution: The linearity of P follows from the distributivity and linearity in the first 
argument of the inner product. Verifying idempotency and self-adjointness establishes 
that P is an orthogonal projection operator, by application of Theorem 1.261 

(i) Idempotency: 

P 2 X = Yl ( 5Z^' <Pk}<Pk, Vm ) tp m = 5Z 5Z (( X < fkWh, Vm> Pm 
mel \k€X I mel fcel 

= 5Z 5Z^ X ' fk)(fk, <Pm)p m = 2J ZJ ^m-fcfa fk)fm 
m€X k€X mEXkEX 

= y^A X > <Pk)<Pk = Px, 

k€X 

where (a) follows from the distributivity of the inner product; (b) from the linearity 
in the first argument of the inner product; and (c) from the orthonormality of the 
set {<p k }k£X- 
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(ii) Self-adjointness: 

(Px,y) = (^{x, tp k )(fi k ,y) = ^2({x, tp k )tp k ,y) 

\k€T I feel 

= ~Yl{x,(p k ){tfi k ,y) = ^{tp k ,y){x,<p k ) 
fceZ feez 

= ^2iv, <Pk)*(x, Vk) = ^2 (*, (y, <Pk)Vh) 

feel feex 



(/) 



x > ^2(y, v>k)tfik ) = (x, Py), 
feez 



where (a) follows from the distributivity of the inner product; (b) from the linearity 
in the first argument of the inner product; in (c) we just exchanged the order of the 
two inner products; (d) follows from the Hermitian symmetry of the inner product; 

(e) from the conjugate linearity in the second argument of the inner product; and 

(f) from the distributivity of the inner product. 

1.5. Legendre Polynomials 

Consider the vectors 1, t, t 2 , t 3 , ... in the vector space C 2 ([— 1, 1]). Using Gram-Schmidt 

orthogonalization, find an orthonormal set with the same span. 

Solution: Let x k =t k ~ 1 for k S Z + . We initiate the Gram-Schmidt procedure with 

x _ 1 1 

vo ~ 1m ~ (J^idty/ 2 ~ V2' 

Continuing the Gram-Schmidt orthogonalization, 



Vi 



{ x uvo)vo = (£_L t ^-L = o, 



3 1 



x\ — Vi t /3 

V1 ~ ||xi-«i|| ~ (/^ t 2 dt) 1 / 2 ~ V2 4 ' 

v 2 = (x 2 , <p )<Po + (x 2 , <Pi)<pi = II -j^t 2 dt) -j= + ( / ^2 t '^ dt ] V '» 

xi -vi _ t 2 -\ __ 3^/5 / 2 _ 1 N 

"''" ~ ||a;2-f2|| " (/^(i 2 - \) 2 dty/ 2 ~ 2\/2 \ 3, 

This process can be continued to find an orthonormal set of arbitrary size. It can be proven 
by induction that 



2n + l (-l) n d n r/ 2s „, 

V + V 2 2"n! dt™ LV ; J 

1.6. Complexity of Matrix Multiplication and Strassen's Algorithm 

In Example 1.561 we saw an algorithm allowing to multiply two 2x2 matrices using 7 
rather than 8 multiplications. To derive an algorithm that scales slower than the obvious 
0(N 3 ) cost of regular multiplication of N X N matrices, consider the following recursive 
algorithm: 

1. Take N X N matrices with N = 2 k . 



2. Consider block-matrix multiplication: 

Q = AB 
where the blocks Aij and Bij are of size N/2 X N/2 



Ao,o Ao,i 

Ain Al,l 
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3. Strassen's algorithm allows for the computation of the product Q as four submatrices 
Qij of size N/2 X N/2, using only 7 products of N/2 X N/2 submatrices. 

4. The algorithm can be iterated on the 7 subproducts, until they are of size 1. 

(i) Show that this algorithm requires fi = JV log2 7 multiplications, a substantial im- 
provement over N 3 since log 2 7 = 2.80735. 

(ii) Evaluate the number of additions of this recursive algorithm and compare to regular 
matrix multiplication. 

(iii) Write out the algorithm in pseudocode. 

(iv) Write a Matlab function for matrix multiplication of size 2 K X 2 K and compare 
running times for regular versus fast matrix multiplication. Comment on number of 
operations and running time. 

Solution: TBD 



Exercises 

1.1. Multiplication by an Orthogonal Matrix 

Prove that multiplication by an orthogonal matrix U preserves lengths (that is, \\Ux\\ = ||a:|| 
for any x) and angles (that is {Ux, Uy) = (x, y) for any x and y). Show also that the 
eigenvalues of U have unit absolute value. 



1.2. Best Approximation in . 
Mimic what we did in . 



in Section 1.11 and find the best approximation in R 3 of the 



vector x from Figure [P 1.2-1 1 in the (ei, e2)-plane. Find the difference between the vector 
and its approximation and prove it is the smallest possible. 





e3 






1 


4 i 


X 




Xi .x^ 




e2 


X2 


•" 




e^V^ ~ 




^12 





Figure PI. 2-1: Orthogonal projection onto a subspace. Here, x £ M 3 and £12 
is its projection onto the span of {ei,e2J. Note that x — £12 is orthogonal to the 
span {ei,e2J, and that £12 is the closest vector to x in the span of {ei,e2}. 



1.3. Matrices Representing Bases and Frames 
Given is the following matrix: 



\<P0 ¥>i 



r- 



x" 



Si 

1 

v3 



V2 

1 

V3 
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(i) Does this matrix represent a basis? If yes, is it an orthonormal basis? If no, why 

not? 
(ii) Each ipi = \xi yi Zj] , i = 0, 1,2 is a vector in a three-dimensional space R 3 . 
Project each ipi onto the x-y plane, that is, the two-dimensional space R 2 . Write 
the resulting matrix $', where each vector is now in R 2 . Does this matrix represent 
a frame? If yes, is it a tight frame? If not, why not? 

1.4. Linear Independence 

Find the values of the parameter a £ C such that the following set U is linearly independent 

V 



\ 


"0 a 2 ] 




I 


j\ 


" 


'0 5 




[2 j - 2 








1 




"0 





1 


a- 1 


' 


Ja 


1 



For a = j, express the matrix as a linear combination of elements of U. 

1.5. Continuity of the Inner Product 

Show that an inner product is continuous in both of its variables, that is, that 

lim \(x + hi, y + h.2) — (x, y)\ = 0. 

II hill, II h 2 H-o 

1.6. Inner Products on C 

Prove that (x, y) = y* Ax is a valid inner product if and only if A is a Hermitian, positive 
definite matrix. 

1.7. Norms on C N 

Consider the vector space C N of finite sequences x = [a?o xi ■■■ Xff-i\ . Prove 
that vi and vi are norms on C N where 

Af-l /AT-1 \ 1 / 2 

v 1 (x) = J2 1**1. «2W= Ei^n • (p 1 - 7 - 1 ) 

fc=0 \k=0 J 

(Hint: For V2, use Minkowski's inequality (11.198a) .) 

1.8. Orthogonal Transforms and oo Norm 

Orthogonal transforms conserve the 2 norm, but not others, in general. Here we consider 
the oo norm defined in ( [1.35b) . 

(i) Consider the set of real orthogonal transforms on R 2 , that is, plane rotations and 
rotoinversions. Give the best lower and upper bounds ai and hi so that 

0,2 < ||T 2 a;||oo < b 2 

holds for all orthogonal T2 and all vectors x of unit 2 norm, 
(ii) Extend |(i)| and give the best lower and upper bounds ojv and bpj for the general 
case of R 7 ^ with N > 2. 

1.9. Norms 

Let V be the set of all real- valued continuous functions defined on the interval [0,1], and 
define K\ and K2 as 



c) = j \x(t)\dt, K 2 (x) = f j \x{t)\ 2 dt\ 



1/2 
K r {x) 



(i) Check that V is a vector space, 
(ii) Prove that K\ and K2 are norms on V. 



(Hint: For K 2 , use Minkowski's inequality (]1.198b)) .) 

1.10. Cauchy-Schwarz Inequality, Triangle Inequality and the Parallelogram Law 
Prove the following: 

(i) Cauchy-Schwarz inequality given in (11.24) . 
(ii) Triangle inequality given in Definition 1.91 
(iii) Parallelogram law given in fll,25[ ). 
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(iv) A vector space V is an inner product space if and only if it is a normed vector space 
in which the parallelogram law holds. Use the polarization identity, which, for a real 
Hilbert space is 

(x, y) = j{\\x + yf -\\x-yf), (Pl.lO-la) 

while, for a complex Hilbert space, it is 

(x, y) = -{\\x + y\\ 2 - \\x - y\\ 2 + j\\x + jy\\ 2 - j\\x - jyf) . (Pl.lO-lb) 

1.11. Norm Induced by an Inner Product 

Let the mapping (•,•): V X V — > C be an inner product defined on the vector space V. 

Show that the function 

n(x) = v (x, x), for x £ V, 

defines a norm on V. 

1.12. Convergence of the Inner Product in ^ 2 (Z) 

Let x and y be sequences in £ 2 (Z). Show that the series ([1.20b] ) defining (x, y) converges 
absolutely. Since convergence of doubly-infinite series requires absolute convergence, this 
shows that the £ 2 (Z) inner product is always well denned for vectors in £ 2 (Z) (see Ap- 
pendix [EST2). 

1.13. Distances Not Necessarily Induced By Norms 

A distance, or metric d:VxV->Misa function with the following properties: 

(i) Nonnegativity: d(x, y) > for every x, y in V . 
(ii) Symmetry: d(x,y) = d{y,x) for every x,y in V. 

(iii) Triangle inequality: d{x, y) + d{y, z) > d(x, z) for every x, y, z in V. 
(iv) Identity of Indescernibles: d(x, x) = and d(x, y) = implies x = y. 
The discrete metric is given by 

,. . f 0, if x = y; 

d(x,y) = 1 ' ., , 
v ,y > \ 1, lix^y. 

Show that the discrete metric is a valid distance and is not induced by any norm. 

1.14. Definition of oo-norm 

Show that the oo norm in ( |1.35b[ ) is the natural extension of the p norm in ( [1.35a) , by 

proving 

lim llxIL = max \xA for any x £ M. N . 

P^oo r j=o, 1, ..., N-l 

(Hint: Normalize x by dividing by the entry of the largest magnitude. Compute the limit 
for the resulting vector.) 

1.15. Quasinorms with p < 1 

Equation (11.35a) does not yield a valid norm when p < 1. 



(i) Show that Definition I l.S[[iii)| fails to hold for (11.35a) with p = 1/2. 
(ii) Show that for x £ M. N , lim p ^o \\ x \\p gives the number of nonzero components in x. 

1.16. Equivalence of Norms on Finite- Dimensional Spaces 

Two norms || • || a and || • ||(, on a vector space V are called equivalent when there exist 
finite constants c a and cj, such that 

IMU ^ CalMlb an( i II^IU ^ c b ll^llo f° r an ^ £ V. 

(i) Show that the 1 norm, 2 norm, and oo norm are equivalent by proving 

IMIoo < ||a;||a < ||a;||i 

and 

IMIi < V^V||a;||2 < JVHaclloo, 
for all i6l w . 
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(ii) Show by counterexamples that the 1, 2 and oo norms are not equivalent for infinite- 
dimensional spaces. 

1.17. Nesting of & Spaces 

Prove that if x £ ty(Z) and p < q, then x £ £«(Z). This proves fL37) . 

1.18. C p Spaces 

(i) Show that the parallelogram law holds in C 2 ([0, 1]). 

(ii) Show that the parallelogram law does not hold in C p ([0, 1]) for p ^ 2. To this end, 
consider the functions x(t) = t and y(t) = 1 — t. 

This shows that, among all £ p ([0, 1]) spaces, only C 2 ([0, 1]) is a candidate for being a 
Hilbert space. 

1.19. Closed Subspaces and £°(Z) 

Let £°(Z) denote the set of complex- valued sequences with a finite number of nonzero 
entries. 

(i) Show that i°(Z) is a subspace of £ 2 (Z). 
(ii) Show that £°(Z) is not a closed subspace of £ 2 (Z). 

1.20. Infinite Sequences and Completeness 
Consider the sequence 







l/x/2 I/n/2 



where the boxed entry is at position zero. Is this sequence in l 2 ( 
the set of sequences {a; n -2fc}fcez form a basis for £ 2 (X)1 Why? 
1.21. Completeness 

Consider the inner product space "P of all polynomials with 



Why? If it is, does 



(p,q) = i P(t)q*(t) 
Jo 



<lt. 



and let p^ be the following Cauchy sequence in V: 

p*w = E4**- 

Prove that V C C 2 ([0, 1]) is not a Hilbert space. 

1.22. Completeness of C N 

Prove that C equipped with the p norm is a complete vector space for any p £ [1, oo) or 
p = oo. You may assume that C itself is complete. 

(Hint: Show that having a Cauchy sequence in C N implies each of the N components is a 
Cauchy sequence.) 

1.23. Cauchy Sequences 

Show that in a normed vector space, every convergent sequence is a Cauchy sequence. 

1.24. Norms of Operators 

(i) Consider a symmetric matrix A with 

A - \ X 3 
A ~ [ 3 1 

Calculate ||A|| and ||A _1 ||. 

(Hint: Calculate the eigenvalues of the matrix A.) 
(ii) Consider operators that map £ 2 (Z) to itself. For the following operators, indicate 
their norm, or bounds on their norm. 



(i) 



(Ax) n 



pJ e„ 



n £ 
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(n) 



(Ax) 2 



X2n + X2ri 



(Ax) 2 



■ X2n+1, 



n £ 



1.25. Relation Between Operator Norm and Eigenvalues 
Show that for any A e C MxN , 



\\A\\ 2 = v /A max(A*A), 

where A max denotes the largest eigenvalue. 

Hint: Apply the diagonalization fll.223a| ) to A* A. 

1.26. Adjoint Operators 

Prove parts (iv), (vi), (vii), and (viii) of Theorem 11.211 

1.27. Eigenvalues of Definite Operators 

Let A be a self-adjoint operator on Hilbert space H. 

(i) Let A be an eigenvalue of A. Show that A positive semidefinite implies A > and 

furthermore A positive definite implies A > 0. 
(ii) Show that the existence of a nonpositive eigenvalue implies A is not positive definite 

and furthermore the existence of a negative eigenvalue implies A is not positive 

semidefinite. 

1.28. Operator Expansion 

Let A : H — > H be a bounded linear operator with ||A|| < 1. 

(i) Show that / — A is invertible. 
(ii) Show that, for every y in H, 



{i-a)-\ = J2 Ak y- 



(iii) In practice one can only compute a finite number of terms in the series. For \\y\\ 
and K terms in the expansion, find an upper-bound on the error: 



(/-A)"V 



fc=0 



"1 


0] 


1 


1 





1 



1.29. Projection Operators 
Let 



Find all projection operators onto the range of B. Specify which one is the orthogonal 
projection operator onto the range of B. 

1.30. Projection via Domain Restriction 

Recall the definition of 1 T : £ 2 (R) -> £ 2 (R) in ( [L58) . 

(i) Show that lj is an orthogonal projection operator. 

(ii) Show that if Xi and I 2 are disjoint subsets of R, the ranges of the associated oper- 
ators, TZ.(\x x ) and 7?.(li, ), are orthogonal. 

(iii) Under what condition does the orthogonal decomposition £ 2 (IR) = 7^.(1^ )©7?.(lx 2 ) 
hold? 

1.31. Inverses, Adjoints, and Projection Operators 
Prove Theorem 1.291 

1.32. Systems of Linearly Independent Vectors 

Given a set of M-dimensional linearly independent vectors {ag, oi, . . . , a^f_i}, with N < 
M, and a vector b in M. M outside their span: 

(i) Is the vector set {ag, oi, . . . , ajv-i, b} a basis? Explain. 

(ii) Give an expression for the distance between b and the projection of x S R^ onto 
A = {a , oi, ..., ajv-i}- 
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(iii) Write the equations for the components of the error vector. 

(iv) Find the least-squares solution x and show that the projection of b onto the space 
spanned by the columns of A can be computed with the use of P = A(A T A) -1 A T . 
(v) Prove that P is an orthogonal projection matrix. 

1.33. An Inner Product on Random Vectors 

Consider the set of complex random vectors of length N with the restriction that each 
component has finite variance. This set forms a vector space over C. 

(i) Show that (x, y) = ^2 k S E[x^y^ ] is a valid inner product on this set. In doing so, 

explain the meaning of x = 0. 
(ii) What condition makes random vectors orthogonal under this inner product? 
(iii) Zero-mean random vectors x and y are called uncorrelated when the matrix E[xy T ] 

is all Os. Are orthogonal vectors uncorrelated? Are uncorrelated vectors orthogonal? 
(iv) Let x and y be zero-mean Gaussian vectors. Does (x. y) = imply x and y are 

independent? If not, what weaker condition is implied? 

Note that the inner product defined in this exercise is not often useful in deriving optimal 
estimators; see Solved Exercise 11.341 

1.34. Bayesian Linear MMSE Estimation via Orthogonality Principle 

As in Example |1.64| let x and y be jointly-distributed real random vectors with E[x] = /j, x , 

E [y] =mvi and 



E 



J T ] 



^v 



Use the projection theorem to find the linear minimum mean-squared error (LMMSE) 
estimator of x as a function of y, that is, the optimal estimator of the form x = Ay + b. 

1.35. Riesz Bases 

(i) Prove that the standard basis in £ 2 (Z) is a Riesz basis with A min = A max = 1- 
(ii) Let {efe}fc e z denote the standard basis in £ 2 (Z) and define the following scaled 
version: 

ip k = 2 k e k , fceZ. 

Prove that {<pk}k£Z is a basis, but there is neither a positive A m j n nor a finite A max 
such that (1.80) in the definition of Riesz basis holds, 
(iii) Let 

ipk = cos(fc)efc, k £ 7L. 

Prove or disprove that {i/ , fe}fcez i s a basis for ^ 2 (Z), and prove or disprove that 
{V'fclfeez i s a Riesz basis for £ 2 (Z). 

1.36. Basis that is not a Riesz Basis 
Complete Example 1.2i3f ii) by showing that 

k 



¥k 



J2(k + l)-^ 2 e h , fceN. 



is a basis for £ 2 (N) but not a Riesz basis. 
1.37. £ p Norms in Different Bases 

Consider R 2 and the standard basis {eo, ei} as well as another orthonormal basis {<£0; 'Pi}- 
Take any vector x expressed as \xq x{\ in the standard basis, and as [ao c»i] in the 
{i^Oi Vi} basis with a k = {x, <pk), for k = 0, 1. 

(i) Among (P norms, for p £ [1, oo], show that the 2 norm is the only one invariant with 
respect to the basis, that is, for arbitrary vectors, 

II r i T ll II r i T ll 

\\[X Q Xx\ = [tt QiJ 

i lip n lip 

hold only for p = 2. 
(ii) Generalize this to biorthogonal bases in R 2 . 
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(iii) What can you say about arbitrary dimensions? 

1.38. Symmetric and Antisymmetric Functions 

Consider the vector space of real functions in C 2 ([—tt,tt]) over the real numbers, i.e., 
square-integrable real functions on the interval [— jr,7r]. It has an orthonormal basis given 

by 

1 cos t sin t cos 2£ sin 2£ cos 3£ sin 34 1 



Consider the following two subspaces: S - space of symmetric functions, that is, f(t) = 
/(— £), and A - space of antisymmetric functions, /(£) = — /(—£). 

(i) Show how any function /(£) from C 2 ([— n, it]) can be written as /(£) = / s (£) + fait) 

with/ s (£) £5 and f a (t) £ A. 
(ii) Give orthonormal bases for S and A. 

(iii) Verify that £ 2 ([-vr, n]) = S © A. 

1.39. Orthonormal Sets 
Let E = C 2 ([-7T,n]). 

(i) Consider the set </>&(£) = — exp(j'fc£) for k £ Z. Prove that {(^fcMlfceZ i s an 

orthonormal set in E. 
(ii) Consider {</?&(£) = —j= sin(fc£)}) 1 .> 1 . Prove that {<Pk(t)}k>i is an orthonormal set in 

E but that it is not a basis, 
(iii) Consider {«/»&(£) = —7= cos(fc£)}j c > 1 . Show that the set { , — ,fh{t)i ^ , fcft)}fc>l i s an 

orthonormal set in E. 

{Hint: See also Problem 11.381 ) 

1.40. Least-Squares Approximation in an Orthonormal Representation 

Assume a finite-dimensional space M. N and an orthonormal basis {fi, <P2, ■ ■ • > Viv}- Any 
vector x can thus be written as x = ^2 i=1 ct^pi where cti = (x, tpi). Consider the best 
approximation to x in the least-squares sense and living on the subspace spanned by the 
first k vectors, {tpi, <P2, •• • , Vfcli or a; = J^v_ 1 Piipi. Prove that ft = cti for i = 1, 2, . . . , k, 
by showing that it minimizes ||a; — x\\. 
(Hint: Use Parseval's equality.) 

1.41. Orthogonal Projection in W, N 

Let {VfejfeeK be an orthonormal set. Prove that 

Px = y^(x, <p k )(p k 

k€K. 

is an orthogonal projection operator onto spa,n({ipk}kEtc)- 

1.42. Generalized Parseval's Equalities 

(11. 109 [ I and fll.HOJ ) using the biorthogonality of the pair of bases. 

1.43. Biorthogonal Pair of Bases of Cosine Functions 

Consider the set $ = {<^>h}k^n defined in Example 11.311 and the sets \P = {V'felfeeN an d 
* = W>fc}&eH defined in Example 11.371 

(i) Show that \I/ and Nt satisfy the biorthogonality condition (11.102) ), 
(ii) Show that span(^I') = span ('J) = span($). 

1.44. Dual Bases 

Let <E> = {</?fc}fcE/e be a Riesz basis for Hilbert space H with constants A m i n and A max . 

(i) Show that the dual of the dual of <1> is $. 

(ii) Show that the dual of $ is <& if and only if 4? is an orthonormal basis, 
(iii) Show that the dual of $ is a Riesz basis with constants 1/A max and 1/A m j n . 

1.45. Oblique Projection Property 
Prove Theorem 1.461 
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1.46. Normal Equations 



Verify that x in (11.1271) is the orthogonal projection of x onto the subspace span({^A. Hex). 
Consider both the case when {V'fclfcez are linearly independent as well as when they are 
linearly dependent. 

1.47. Orthogonal Projection in Coefficient Space 

Let {<Pk}kEK be a basis for the Hilbert space H, and let {V'fclfcez be another set in H. 
Denote the orthogonal projection of a: onto span({ipk}k€l) by x, and denote representations 
of x and x with respect to {tfik}k£K by x = $o and x = <&a. 

(i) Find an expression for P : i 2 (fC) — > i 2 (IC) that maps a to 3. You may use the 

operators $, $, and \F 
(ii) Show that P is a projection operator, 
(iii) Show that P is an orthogonal projection operator if and only if {fk}k€lC ls an 

orthonormal basis for H. 

1.48. Successive Approximation with Nonorthogonal Basis 

Let {<fii}i£M be a linearly independent set in the Hilbert space H. For each k £ N, let 
Sk = s P an ({V0i Vii • • ■ i Vfc-i}) an d S'*' denote the best approximation of a; in Sk- Prove 
that the recursive algorithm given in 1 )1.131) provides a sequence of best approximations 
that all satisfy the normal equations (11. 128a} . 
[Hint: Use induction over k.) 

1.49. Exploring the Definition of Frame 
Let \P = {ipk}k€ J C Hbe a frame. 

(i) Show that if \t is not linearly independent, the following is not true: For any ex- 
pansion x = ^2k€j a k 1 Pk^ condition (11.80| ) holds. 

(ii) Show that for any x £ H, there exists an expansion x = ^2 k £j~ a k q J ) k such that 
condition (11.801 1 holds. 

Frame of Cosine Functions 

Find the lower and upper frame bounds for the frame 

{1, V2cos(nt)} I) {V2cos(2nkt), 2cos(irt) cos(2irkt)} k€Z+ 

given in Example 1.441 

Dual Frame 

Consider the space K 4 . It is clear that 

5„ 



1.50. 



1.51. 



<Pk,n — °n-k 

3, form a basis for this space. 



for k = 0, 1, ..., 3, n = 0, 1, 
(i) Consider 

tpk,n = $: 
for k = 0, 1, 2, and i/>3,n = — $n-l + S n - 

does not form a basis for R 4 . Which vectors in R are not in span({-(/)fc}|_ )? 
(ii) Show that F = {<Pk>'4'k}k = Q i s a frame. Compute the frame bounds A and B. 
(iii) Find the canonical dual frame to F. 
(iv) Let the vectors ifk an <l V'fe be given as columns of the following two matrices: 



i — k On — k — 1 1 

4. Show that the 4-dimensional set {V , fe}|_Q 



1 








1 


1 


1 











1 


1 











1 


1 



1 








-1 


1 


1 











-1 


1 











-1 


1 



Comment on the completeness of 4? and \P in R 4 , show that F ■■ 
and find its frame bounds as well as a dual frame. 



[$ \P] is a frame, 



1.52. Properties of Dual Pair of Frames 

This exercise develops results for dual pairs of frames that parallel the results for biorthog- 



onal pairs of bases established in Section 1.5.31 Let <E> : 
dual pair of frames for Hilbert space H. 



{<Pk}k€j and * = Wk}k£j be a 
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(i) Show that fl 1.1441 1 holding for every x in H implies (| 1.145) holds for every x in H. 

This establishes that the roles of the two frames in a dual pair can be reversed, 
(ii) Let x and y be any vectors in H. Show that if 5 = $*x and /3 = &*y, then 
(x, y) = (a, 0). This shows how frame expansions enable computation of an inner 
product in H through an l 2 (J) inner product, 
(iii) Let IC J and recall the notations (11.941 and (11.951 for restricted synthesis and 
analysis operators. Determine a sufficient condition under which $x$J is a pro- 
jection operator. (Forcing <3? and <3> to form a biorthogonal pair of bases is not an 
adequate answer.) 

Random Frame 

Let W N = e - j2w / N . Consider the square [— 1, 1] X [-1, 1] of pairs in R 2 . Pick TV points 

at random with uniform distribution from this set, let their coordinates be (xi,yi), i = 

1, 2, . . . , N. Create the matrix M having the entries (xi,yi) as rows. What are the 

condition(s) that the pairs (xi,yi) should fulfill so that M is a frame? 

(Hint: This is a purely geometrical exercise. Think of what is needed to represent a vector 

in a two-dimensional space.) 

Tight Frames as Projections from Orthonormal Bases 

Let W N =e-i 27, / N . 



1.53. 



1.54. 



(i) Consider a frame expansion of R 2 that is , 

4> k = [^{W%} s{w*}] T , k = o,. 

Plot the vectors of the frame on the unit circle, for JV = 
(ii) Consider now the DFT matrix: 

11 1 ... 1 

1 W N W 2 ... W 



;iven by the N vectors 4>k > with 



,/V- 1. 
= 3, 4, 5. 



(PI. 54-1) 



1 



w"- 1 wT-^ ... ww-w-v 



jv " jv • • • ' N 

and let F p be the matrix formed by its first p columns and normalized correspond- 
ingly; that is, having as its ith column vector (for i = 0, . . . ,p— 1) 

1 



/* 



v^ 



[W#] n , n = 0,...,/V- 



1. 



Show that F p is a tight frame with redundancy factor N/p , that is 



F*F - N T 
p 



(Hint: Use the identity E^ 1 W N L 
tion, and <£Z.) 



NSr. 



where 5 n is Kronecker delta func- 



1.55. Tight Frame with Nonequal-Norm Vectors 

Assume a £ R, a ^ 0, and take the following set of vectors: 



rO 



rl 



cos 9 
sinf? 



•t2 



■ cos 9 
sin 9 



For which values of 9 and a is the above set a tight frame? Draw a few representative 
cases. 

1.56. Tight Frame of Affine Functions 

In £ 2 ([0, 1]), consider the subspace S of affine functions. Find a 4-element tight frame <I> 
for S that includes ipo(t) = 1 and ipi(t) = \/3t. Also find the frame bound A and the 
canonical dual frame. 
Hint: A union of orthonormal bases is always a tight frame. 

1.57. Relation Between Bases 

Given are two different bases <£> = {<Pk}k€Z> * = {V'fcHeZi f° r a Hilbert space H. Show 
how to express one in terms of the other. 
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1.58. Change of Basis 

Given are two different bases 4> = {<Pk}k€Z> * = {^fc}fcez> f° r a Hilbert space H. Show 
how to change the representation of a given x £ H from one basis to the other, that is, 



fee/c 



k€K 



Calculate it when <I> and \I/ are 
$ = 

1.59. Complex Multiplication 

Multiplying two complex numbers 

can be written as 



1 1 
1 



1 
1 1 



3 


f = 


la + jb) 


c 


+ j< 


e 




c —d 




a 




f\ 




d c 




b 


' 



which takes 4 multiplications and 2 additions. Show that this can be computed with 3 
multiplications and 5 additions. 
1.60. Gaussian Elimination 

Our aim is to solve the system of linear equations Ax = y (general conditions for existence 
of a solution are given in Appendix 1 1 . B . 1) . Comment on whether the solution to each of 
the following systems of equations exists, and if it does, find it. 

(i) 



(ii) 



(iii) 



1 





3 


1 


5 


2 


-1 


-1 


2 


1 





2" 


4 


5 


8 


1 


-1 


-2 


1 





2 


4 


5 


8 


-1 


-1 


-2 



u 



7 
38 
-9 

1 

2 
3 

1.61. Kaczmarz's Algorithm 

Consider Example 1.63 and prove the following facts when the size- TV X TV matrix A is of 
rank N, and thus the set of rows r n , n = 0, 1, . . . , N — 1 is linearly independent. 

(i) Show that the intersection of the afhne subspaces S n , n = 0, 1, . . . , N — 1, is unique 

and specifies the solution of the linear system Ax = b. 
(ii) Prove that when the rows are mutually orthogonal, Kaczmarz's algorithm converges 

in N steps or only one sweep. 

1.62. Sums and Products 

Compute the following sums and products: 

« nr= 1 ex P (^ rj ) 

(ii) E ^exp(i2f ) 

(iii) 5Zfc°=o ak z ~ k (specify also the region of convergence, meaning the values of z for 
which there is convergence) 

1.63. Convergence of Sequences 

Let {ifc}jJL converge to a and {bk}^ =0 converge to b. Prove the following: 

(i) If c is some real number, cao, cai, . . . converges to ca. 
(ii) ao + bo i "i + b\ , ... converges to a + b. 
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(iii) ciobo, ai&i, ■ • • converges to ab. 

(iv) If a k ^ for each k £ N and a ^ 0, 60/10, 61/01, . . . converges to b/a. 
f 
1.64. Convergence Tests 

Let {ufcl^Lj and {fefej^j be sequences that satisfy < aj. < fej. for every fc £ Z+. Then 

• '^ EfcLi kfc converges, X)fcLi a fe converges as well; and 

• if XlfcLi a * diverges, X)fcLi &fc diverges as well. 

The ratio test for convergence of X)j?Li a fc states that, given 



lim 

k — *oo 



a fc + l 



(PI. 64-1) 



• if < L < 1 , the series converges absolutely; 

• if 1 < L or L = 00, the series diverges; and 

• L = 1, the test is inconclusive, the series could converge or diverge. The convergence 
needs to be determined another way. 

Based on the above, determine whether Ylk=i c k converges for each of the following se- 
quences: 

(i) c k = fc 2 /(2fc 4 - 3). 

(ii) c k = logk/k. 

(hi) c k = k k /k\. 

(iv) c k = a k /k\. 

1.65. Useful Series 

In this exercise, we explore a few useful series. 

(i) Finite Sum: Prove the following formula: 

JV-1 



1 _ iJV 

E ** = T3T- (P 1 - 65 - 1 ) 

k=n 
(ii) Geometric Series: Determine conditions on when 



J2 t k (Pl.65-2) 

A: = 



converges. Prove that when it converges its sum is given by 

1 



1 -t 
(iii) Power Series: Determine whether 



(PI. 65-3) 



E a k t k (PI. 65-4) 

fc=i 

converges, as well as when and how. 
(iv) Taylor Series: If a function x(t) has (n + 1) continuous derivatives, then it can be 
expanded into a Taylor series around a point to a s follows: 

*W = £ ^^ W (M + Rn, Rn = VlC 1 ^ 1 '^ ( PL65 " 5 ) 

k~ k - ( n+1 ) ! 

for some £ between t and to- Find the Taylor series expansion of x(t) = 1/(1 — t) 
around a point to. 
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Function Expansion 



sint 
e* 

sinh t 
ln(l + t) 



Function Expansion 



°° xk 

T — 



E 



t (2fc + l) 

(2fc + l)! 



E(-D (fe+1, T 



cost 

a' 

cosh t 
1-M 



hi 



1-t 



£(-D A 



(2fcjl 



E 

fc=0 

oc 

E 



(tin a)* 

t (2fc) 



(2fc)! 



E^ 



t (2fc-l) 

2fc- 1 



Table PI. 65-1: Useful MacLaurin series expansions. 



(v) MacLaurin Series: For a = 0, the Taylor series is called the MacLaurin series: 



/(*) 



E^T/ W (0) 



fc!" 



K„ 



(PI. 65-6) 



Find the MacLaurin series expansion of /(t) = 1/(1 — t). 



1.66. Eigenvalues and Eigenvectors 
Consider the matrices A 
zero at the same time. 



1 


2 


and B = 


a 


P 


2 


1 




13 


a 



where a and f) cannot be equal to 



(i) Give the eigenvalues and eigenvectors for A and B (make sure that the eigenvectors 

are of norm 1). 
(ii) Show that A = VDV T where the columns of V correspond to the eigenvectors of A 

and D is a diagonal matrix whose main diagonal corresponds to the eigenvalues of 

A. Is V a unitary matrix? Why? 
(iii) Check your results using the built-in Matlab function eig. 
(iv) Compute the determinants of A and B. Is A invertible? If it is, give its inverse; if 

not, say why. 
(v) When is the matrix B invertible? Compute B _1 when it exists. 

1.67. Operator Norm, Singular Values and Eigenvalues 

For matrix A (bounded linear operator), show the following: 

(i) If the matrix A is Hermitian, for every nonnegative eigenvalue A^ , there is an identi- 
cal singular value <r; = A^ . For every negative eigenvalue A^ , there is a corresponding 
singular value Oi = |Aj.|. 

(ii) If the matrix A is Hermitian with eigenvalues A^, 



\\Ah 
(iii) Call /if,, the eigenvalues of A* A; then 



max{A fc }. 



y/max{fi k } = max{o-fc}. 

1.68. Least-Squares Solution to a Linear System of Equations 
The general solution to this problem was given in (1.208) . 

(i) Show that if y belongs to the column space of A, then y = y. 
(ii) Show that if y is orthogonal to the column space of A, then y = 0. 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovacevic, and V. K. Goyal 



Exercises 



169 



1.70. 



(iii) Show that for the least-squares solution, the partial derivatives d(\y — y\ 2 )/dxi are 
all zero. 

Power of a Matrix 

Given a square, invertible matrix A, find an expression for A k as a function of the eigen- 
vectors and eigenvalues of A. 

Conditional Distributions of Jointly Gaussian Vectors 
Let [y T z T ] be a jointly Gaussian vector with 





y 

z 




= 


My 



and 



Ey; 
E Z 



(i) Let A be a matrix with dimensions such that Ay is defined. Show that Ay is a 
jointly Gaussian vector with mean Afi y and covariance matrix AT, y A T . 



(ii) Why must we have E y 



E z 

,T1 T 



and E v 



E T ? 



(iii) Using the joint PDF of [y T z T ] , find the joint PDF of y, thus showing that y is 
jointly Gaussian with mean /i y and covariance E y . 

(iv) Using the joint PDF of [y T z T ] and the joint PDF of y, show that the conditional 
distribution of y given z = t is jointly Gaussian with mean fj, y + E yz EJ (t — fj, z ) and 
covariance E v — E VZ E7 E zv . 
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Chapter 2 

Sequences and 
Discrete-Time Systems 



Contents 
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Further Reading 303 

Exercises with Solutions 304| 

Exercises 310| 

Time is ordered — from past to future. Any countably-infinite set of times 
can be indexed by the integers to maintain this order, and associating the integers 
with discrete time prompts us to refer to doubly-infinite sequences as discrete-time 
signals. As we saw in the previous chapter, these sequences form the vector space C z 
(assuming they are complex- valued) . Operators that map a sequence to a sequence 
are called discrete-time systems. 

Some important classes of sequences and discrete-time systems have physi- 
cal interpretations. For example, restrictions of sequences to the normed vector 
spaces £ 2 (Z) and £°° (Z) correspond to the physical phenomena of finite energy and 
boundedness. When the discretization of time is to evenly-spaced points, the con- 
stancy of physical laws corresponds to a shift-invariance property for discrete-time 

171 
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172 Chapter 2. Sequences and Discrete-Time Systems 

systems. Linearity and shift invariance allow a system to be described uniquely by 
convolution with the system's impulse response. Once the convolution operation is 
defined, spectral theory allows us to construct an appropriate Fourier transform. 
Shift-invariant systems, convolution, and the discrete-time Fourier transform also 
have myriad uses that need not have a physical underpinning. 

The above discussion implicitly assumed that the underlying domain, time, is 
infinite. In practice we glimpse a finite portion of time. Although it might seem 
that dealing with finite time would be easier, the tools we develop do not handle a 
beginning or an end to time. We discuss handling a finite amount of data throughout 
the chapter. 

2.1 Introduction 

While sequences in the real world are often one-sided infinite (they start at some 
initial point and then go on), for mathematical convenience, we look at them as 
two-sided infinite! 33 ! 



X-2 X-i 



•i'O 



x x x 2 ■■■] . (2.1) 



For example, if you measure some physical quantity every day at some fixed time 
(for example, the temperature in degrees at noon in front of your house), you obtain 
a sequence starting at time (say January 14th) and continuing to infinity, 







32 



29 30 



Implicit in the index is the fact that x n corresponds to the temperature (at noon) 
on the nth day. A sequence is also known under the names discrete-time signal (in 
signal processing) or time series (in statistics); mathematically, these are all vectors, 
most often in an infinite-dimensional Hilbert space. 

In real life, we observe only a finite portion of an infinite-length sequence. 
Moreover, computations are always done on finite inputs. For example, consistent 
temperature recordings started in the 18th century and necessarily stop at the 
present time, producing a sequence of length N for some finite N G N: 

x\ x 2 ... x N -i] . (2.2) 



.7'o 



Having only this data but methods that apply to all time, what do we do about 
days with no measurements? In effect, we are forced to assign some values. While 
we are limited only by our imaginations, two techniques stand out. 

The first technique is to set x n = for all n outside of{0, 1, ..., N — 1}. This 
is natural because, for any subsequent computation that uses x n values linearly, 
this extension by zeros is equivalent to simply omitting measurements that are not 
available. However, the results are the same as if more data were available only 
when the signal is zero everywhere outside of{0, 1, ..., N — 1}. 



33 The boxing of the time origin is intended to serve as a reference point, essential when dealing 
with infinite vectors/matrices. 
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A second, less obvious technique is to extend the signal circularly j 34 | periodize 
the finite-length sequence, treating the observed values as one period of a periodic 
sequence of period N 6 N: 



■i'O 



xi ... x N -i io ii •■ -] T - (2.3) 



While finite-length sequences in ( 12.2) and infinite-length periodic ones in ( 1 2 . 3 [ ) have 
a fundamentally different character, we will use the same types of tools to analyze 
them. Techniques designed explicitly for finite-length sequences are mathematically 
rooted in treating the sequence as one period of an infinite-length periodic sequence. 
The consequences of this implicit periodization are central in digital signal process- 
ing. 

These considerations allow us to define two broad classes of sequences for 
which to develop our tools: 

(i) Infinite-length sequences are the vector space C z of sequences with domain Z, 
as defined in ( 11. 17b) . The support of a sequence may be a proper subset of Z; 
for example, we will often consider infinite-length sequences that are nonzero 
only at nonnegative times. 

(ii) Finite-length sequences, without loss of generality, have support in {0, 1, . . . , N- 
1}. The tools we will develop do not treat the vector space of finite- length 
sequences as C^ generically, but rather as sequences defined on a circular 
domain. 

Example 2.1 (Sequences) 

(i) Infinite-length sequences: The geometric sequence ( IP 1.65-2]) with t = 1/2, 

Xn = (5) ' n e Z, or (2.4a) 

x = [... 4 2 § i ...] T , (2.4b) 

is of infinite length and does not have finite l\ H 2 , or i°° norm. If we made 
it nonzero only for n > 0, all these norms would be finite. 
(ii) Finite-length sequences: A sequence obtained by observing N tosses of a 
fair coin, recording a for heads and 1 for tails, 

1 1 1 ... ol , 



is of finite length. There is no extension of this sequence outside of the TV 
observed values that is particularly natural. 



Another name for a circular extension is a periodic extension. 
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174 Chapter 2. Sequences and Discrete-Time Systems 

A sinusoidal function sampled at N samples per period, 

2tt 

n + V I , 

2tt A . (2% 



sin [ — n + a I , n £ £, or, 



sin(0) sin(— + 6\ ... sin! — (N - 1) + d ) sin (0) ...] 7 , 



one period 

is an infinite-length periodic sequence. Taking N samples 



■*'0 



Xi X 2 ... Xat_i 



gives a finite-length sequence for which circular extension is quite natural. 

Given sequences (signals, vectors), one can apply operators (systems, filters). These 
map input sequences into output ones, and since they involve discrete-time se- 
quences, they are usually called discrete-time systems (operators). Among them, 
linear ones are the most common. Even more restricted is the class of linear shift- 
invariant systems (filters, defined later in the chapter), an example of which is the 
moving average filter: 

Example 2.2 (Moving average filter) Consider our temperature example, 
and assume we want to detect seasonal trends. The day-by-day variation might 
be too erratic, and thus, we compute a local average 

(JV-l)/2 

Vn = Jj 5Z Xn ~ k > neZ, (2.5) 

fc=-(JV-l)/2 

where N is a small, odd integer. The local average reduces daily variations; this 
simple filter is linear and shift invariant (defined later in the chapter), since at 
all n, the same local averaging is performed. 

Chapter Outline 

The next several sections follow the progression of topics in this brief introduc- 
tion: In Section 2.21 we start by formally defining the various types of sequences 
we discussed above. Section 12.31 considers linear discrete-time systems, especially 
of the shift-invariant kind, which correspond to difference equations, the discrete- 
time analogue of differential equations. Next, in Sections 2.4H2.61 we develop the 
tools to analyze discrete-time sequences and systems, in particular the discrete-time 
Fourier transform, the z-transform, and the discrete Fourier transform. We discuss 
the fundamental result relating filtering to multiplication in Fourier domain — the 
convolution property. Section 2.71 looks into discrete-time sequences and systems 
that operate with different rates — multirate systems, which are key for filter-bank 
development in later chapters. This is followed by discrete stochastic processes and 
systems in Section |2.8[ while important algorithms for discrete-time processing, 
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such as the fast Fourier transform, are covered in Section 12.91 Appendix I2.A.1 lists 
basic elements of complex analysis, while Appendix |2.B discusses some elements of 
algebra, in particular, polynomial sequences. 

Notation used in this chapter: We assume sequences to be complex in general, at 
the risk of a bit more cumbersome notation at times. Thus, Hermitian transposition 
is used often. We will be using || • || to denote the 2 norm; any other norm, such as 
the 1 norm, || ■ ||i, will be explicitly specified. D 



2.2 Sequences 

2.2.1 Infinite-Length Sequences 

The set of sequences in (2.1) , where x n is either real or complex, together with 
vector addition and scalar multiplication, forms a vector space (see Definition I Lip . 
The inner product between two infinite-length sequences is defined in (1.20b) and 
induces the standard £ 2 (or Euclidean) norm (1.23b[ ). Other norms of interest are 
the i 1 norm from ( 11. 36a) with p = 1, and the 00 norm from (1.36b) . 

As opposed to generic infinite-dimensional spaces, where ordering of indices 
does not matter in general, discrete-time sequences belong to an infinite-dimensional 
space where ordering of indices is important since it represents time. Note that in 
some instances later in the book, we will be dealing with vectors of sequences, for 
example, x = \xq £1] where xq and x\ are sequences as well. We now look into 
a few spaces of interest. 

Sequence Spaces 

Space of Square-Summable Sequences £ 2 (Z) The constraint of a finite square 
norm is necessary for turning the vector space C z defined in (1.17b) into the Hilbert 
space of finite- energy sequences £ 2 (Z). This space affords a geometric view; we now 
recall a few such geometric facts from Chapter [D 

(i) The inner product between two vectors is 

(x, y) = \\x\\ ||j/||cos0, 

where 6 is the angle between the two infinite- length sequences (vectors), 
(ii) As in Definition 1.81 if the inner product is zero, 

{x, y) = 0, 

the sequences are said to be orthogonal to each other. 
(iii) As in (1.611 ), given a unit-norm sequence y, \\y\\ = 1, 

x = (x, y)y 

is the orthogonal projection of the sequence x onto the space spanned by the 
sequence y. 
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Space of Bounded Sequences £°°(Z) A looser constraint than finite energy is to 
bound the magnitude of the samples. The space of bounded sequences contains all 
sequences x n such that, for some finite M, \x n \ < M for all n 6 Z. This space is 
denoted £°°(Z) since it consists of sequences with finite £°° norm. 

Space of Absolutely-Summable Sequences £ X {Z) A more restrictive constraint 
than finite energy is to require absolute summability (remember that £ l (Z) C £ 2 (Z) 
from (1.37) ). By definition, sequences in £ X (Z) have a finite £ norm. 

Example 2.3 (Sequence spaces) Revisiting the geometric sequence such as 
the one in ( J2.4J) , we see that for a £ M, 

0, for n < 0; (2g) 



a n , for n > 



is in the following spaces: 



£ 2 (Z),£ 1 (Z),£°°(Z) 
For I \a\ = 1 } , x € { (.°°{Z) 




none 

Special Sequences 

We now introduce the sequences most often used in the book. 

Kronecker Delta Sequence The simplest nonzero sequence is the Kronecker delta 
sequence, 

f 1) f° r n = 0; _ ,„ _ , 

S n = < ' „^_ T , ' nGZ, or, (2.7a) 



0, otherwise, 



,T 



rt 



[... ...] . (2.7b) 

Shifting the single 1 in the sequence to position k gives what is called the Kronecker 
delta sequence at location k, which is #«._&. The set of Kronecker delta sequences 
{fin-k}kez along the discrete time line forms an orthonormal basis for £ 2 (Z); we 
called it the standard basis in Chapter [U Table 2.1 lists some properties of the 
Kronecker delta sequence. (The shifting property uses convolution, which is defined 
in (239).) 

Sine Function and Sequences The sine function appears frequently in signal 
processing and approximation. It is defined as 

sinc W = {^ ^lt° [ (2.8a) 

Evaluation of lim t ^o sinc(i) using l'Hopital's rule confirms the continuity of the 
function. Scaling the sine function with l/y^ makes it of unit norm, that is, 

-^sinc(i) = 1. (2.8b) 
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Kronecker delta sequence 



Normalization \ 8 n = 1 

nez 
Sifting 2J x no - n S„ = 22 <5„ . 



Shifting 

Sampling 

Restriction 



3-n *n "n — no — *Erz — 
X n <5 n = Xo <5 n 
»n <5 n = lfo}^ 



Table 2.1: Properties of the Kronecker delta sequence. 




sinc(7m/2) 



W 



is 



-10 N -U6 X 7 2 



a / ff^f 7 To 



(a) (b) 

Figure 2.1: (a) The sine function sinc(i). (b) The sine sequence sinc(7m/2). 



The sine function is zero at t n = nir, for n/0; together with the value at t = 0, 
this gives 

sinc(mr) = S n , n £ Z. (2.8c) 

The sine function is illustrated in Figure 12.1( a) . 

For any positive T, we can obtain a sine sequence 



1 , lr ^s 1 sin(7m/T) 
-= sinc(7m/T) = — — ' . 



(2.9) 



This sequence is of unit norm and is in ^°°(Z) and in £ 2 (Z). It is not in £ (Z) since 
it decays as 1/n (see Example 1.81 illustrating the inclusion property ( 11.37) of £ p 
spaces). It is zero at nonzero integers n/T; this is illustrated in Figure [2TTT b) for 
T = 2. 



Heaviside Sequence The Heaviside or unit-step sequence is defined as 

neZ, or, (2.10a) 

[... 1 ...] T . (2.10b) 



1, for n £ N; 
0, otherwise, 
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This sequence is bounded by 1, so it belongs to £°°(Z). It belongs to neither £ X (Z) 
nor £ 2 (Z). The Kronecker delta and Heaviside sequences are related via 

n 

u n = J^ 6k- 

k— — 00 

Pointwise multiplication by the Heaviside sequence implements the domain 
restriction operator ( |1.57[ ) for restriction from all the integers to just the nonnegative 
integers: 

/ x n , for n e N; 
hiX = { 0, otherwise = ""*"' n G Z ' 

From this we can also build other domain restriction operators. For example, do- 
main restriction to {no, hq + 1, . . . , n\) is achieved with a difference of two shifted 
Heaviside sequences: 

-, _ / _ x _ / x n , for n€ {n ,...,ni}; / 9rn 

i{n ,...,m}* - ^™-no Un-m-i;a!„ ~ | , otherwise. l J 

Box and Window Sequences For any positive integer no, the (unnormalized) 
right-sided box sequence is defined as 

1, for 0< n < n - 1; ,„.,, 

_ ,, n € Z, or, (2.12a) 

0, otherwise, 



»' 



|T| 1 ... 1 ...] T . (2.12b) 



For odd no, the centered and normalized box sequence is defined as 

l/y/m, for |n| < (n - l)/2 
0, otherwise, 



-4 



e, 


" ' neZ, 
... -i. ... 


or, 

T 


(2.13a) 


1 


(2.13b) 



»' 



Box sequences are also called rectangular window sequences. 

Often, a finite-length sequence is treated as a glimpse of an infinite-length 
sequence. One way to state this is by using pointwise multiplication with a window 
sequence. Multiplying an arbitrary sequence x with the right-sided window w given 
in ( |2.12| ). we obtain a windowed version of x: 

x n = x n w n , neZ, or, (2.14a) 



x = [■■■ xq xi ... x no -i . . .J . (2.14b) 



With this use of an unnormalized rectangular window, x n equals x n for n G 
{0, 1, . . . , no — 1} and is zero otherwise. We sometimes study x through the finite- 
length sequence x that coincides with x over a window of interest. 
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(a) Rectangular window ( 12.121) . (b) Raised cosine window ( 12.151) . 



Figure 2.2: A sinusoidal sequence x n — sin((7r/8)n + n/2) (dashed plots) and its 
windowed versions w rl x n (stem plots) with two different windows of length no = 26 (solid 
plots). 



How good is the window we just used? For example, if x is smooth, its 
windowed version x is not because of the abrupt boundaries of the rectangular 
window. We might thus decide to use a different window to smooth the boundaries, 
an example of which we now discuss. We look at other commonly used windows in 
Chapter \E\ Exercise [8771 

Example 2.4 (Windows) Consider an infinite-length sinusoidal sequence of 
frequency loq and phase 9, 



sin.(oJon + 9), 



and the following two windows: 



(i) a rectangular, length-no window, as in ( 12.12) ; and 
(ii) a raised cosine window] 35 ! also of length no: 



w„ = 



J(l- 



-> L -cos^V 

2 V no — 1 ' 

o, 



for < n < hq — 1: 
otherwise. 



(2.15) 



The raised cosine window tapers off smoothly at the boundaries, while the rect- 
angular one does not. The trade-off between the two windows is obvious from 
Figure 12. 2[ the rectangular window does not modify the sequence inside the 
window, but has abrupt transitions at the boundary, while the raised cosine win- 
dow has smooth transitions at the boundary, but at the price of modifying the 
sequence inside the window. 



Deterministic Correlation 

We now discuss two operations on sequences, both deterministic, that appear through- 
out the chapter. Stochastic versions of both operations will be given in Section ^. 8. 11 



3 This is also known as a Hann or Hanning window, after Julius von Hann. 
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Deterministic Autocorrelation The deterministic autocorrelation a of a sequence 
x is 

a n = ^x k x* k _ n = (xk, Xk- n )k, (2.16) 

fcez 

where the final expression introduces a notation in which the variable over which 
to sum, k, is explicitly included in the inner product notation. This simplifies our 
discussion because we can use x k - n instead of a new symbol for this time-reversed 
and shifted version of x. The deterministic autocorrelation satisfies 

a n = a*_ n , (2.17a) 

«o = EM' = INI'' ( 2 - 17b ) 

the proof of which is left for Exercise 2.51 The deterministic autocorrelation mea- 
sures the similarity of a sequence with respect to shifts of itself, and it is Hermitian 
symmetric as in (2.17a) . For a real x, 

a n = 22x k x k - n = a- n . (2.17c) 

fcez 

When we need to specify the sequence involved, we write a x , n - 

Example 2.5 (Deterministic autocorrelation of a finite-length sequence) 
Assume x is the box sequence from (2.1 3a| ) with no = 3, that is, a constant se- 
quence of length 3 and height l/v3- Using (2.16) , we compute its deterministic 
autocorrelation to be: 



[... § § § § ...f. (2.18) 



This sequence is clearly symmetric, satisfying (2.17c) . 

Deterministic Crosscorrelation The deterministic cross correlation c of two se- 
quences x and y is 

c„ = ^2x k y* k _ n = (x k ,yk-n)k, (2.19) 

kez 

and is written as c x ,y. n to specify the sequences involved. It satisfies 

Cx,y,n = \/_^y k - nX k\ = I 2-^ V" 1 X ™+n I = c y,x,-m (2.20a) 

Vfcez / Vmez / 

where (a) follows from change of variable m = k — n (see also Exercise 12. 5j) . For 
real x and y, 

Cx,y,n — / j ^h Vk—n — Cy,x,— n- yZ.ZVD) 

kei 
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Example 2.6 (Deterministic crosscorrelation of two finite-length sequences) 
Assume x is the box sequence from ( |2.13aD with no = 3, as in Example 12.51 and 



'/ 







?2Jl 



l/v/3 



Using ( J2.19D , we compute the deterministic crosscorrelations 

1 1 + 72 



^x,y 



-I/, a: 







^ 



1+V2 
3 



^00 



1+V2 
3 



1 + V2 







thus, clearly, ( ]2.20b[ ) is satisfied. 



(2.21a) 
(2.21b) 



Deterministic Autocorrelation of Vector Sequences Consider a vector of N se- 
quences, that is, an infinite matrix whose (k + l)st row is the sequence x^ = 
■ x k -i 



Xk,0 



Xk,l 



x = [xo xi ... x w _ij . 
Its deterministic autocorrelation is a sequence of matrices given by 



A f , 



&0,n 
Cl,0,ra 



Co,l,n 
dl.n 



CjV-l,0,n CjV-1,1,-, 



C0,N-l,n 
Cl,iV-l,n 

OJV-I.ti 



(2.22) 



that is, a matrix with individual sequence deterministic autocorrelations a^ n on the 
diagonal and the pairwise deterministic crosscorrelations C%,k,n off the diagonal, for 
i, k = 0, 1, . . . , N - 1, i =£ k. Because of (2.17a) and (2.20a) , A n satisfies 



ao,n 



C0,l,n 



-0,N-l,-n °1,JV-1,— n 



Co,iV- 


-l,n 


Cl,JV- 


-l,n 


a N - 


l,ra _ 



(2.23a) 



that is, it is a Hermitian matrix (see (1.221a) ). For a real x, it is a symmetric 
matrix, 



/in 



,lf 



(2.23b) 



Example 2.7 (Deterministic autocorrelation of a vector sequence) 

Assume we are given a vector of two sequences x = [xq xu with xq = x and 
X\ = y from Example 2.61 Its deterministic autocorrelation is then 



A n 



°-0,n Co,l, n 
Cl On Ol,n 
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We have already computed three out of four entries in the above matrix: the 
deterministic autocorrelation sequence an. = a x from ( ]2.18[ ) and the deterministic 



crosscorrelation sequences Co,i 



from ( 12. 21a) and C\ o 



from 2.211 



The only entry left to compute is the deterministic autocorrelation sequence a y 



[- f f o 



(2.24) 



again, a symmetric sequence. Because of what we already computed in Exam- 
ples 12.51 and 2.61 that is, the deterministic autocorrelation a x is symmetric, and 
c x,y,n = Cy,x,-m ( ]2.23b[ ) is satisfied. A few of these A n are: 



ri 


1" 




3 


3 










7 









2 


1+V2" 






£ 


h 


; 




3 


3 _ 







1 


1+V2 
3 


1+V2 
3 


1 



2 \/2 

3 r- 3 r- 
l + yf V2 

3 3 . 



2.2.2 Finite-Length Sequences 

Finite-length sequences as in ( 12.2) are those with the domain 

ne {0, 1, ..., N- 1} 

for some positive integer N. A finite- length sequence can be seen either as an 
infinite- length sequence that happens to take nonzero values only inside {0, 1, . . . , N- 
1} or as a period of a periodic sequence with 



X n +kN 



kez. 



(2.25) 



Sequence Spaces 

In the case of a periodic sequence, it is useful to think of the domain itself as 
wrapping around into a circle, with N — 1 next to 0. On this discrete circle domain, 
incrementing the time index is not ordinary addition but rather addition modulo 
N, so we could refer to the domain as Zjv and to the vector space of these sequences 
as C Zjv . We do not actually adopt the notation C Zjv because the standard vector 
space operations (see Definition 11.1) are the same as for C^. 

Differences between periodic sequences (those defined on a circular domain) 
and infinite sequences with finite support emerge with operations that we introduce 
later. For periodic sequences, there is a circular form of convolution. Applying 
spectral theory to this convolution leads to the discrete Fourier transform. As part 
of a more general theory, other convolution operators would lead to different Fourier 
transforms; more details on this topic can be found in Further Reading. 



Special Sequences 

Periodic Kronecker Delta Sequences A periodic version of the Kronecker delta 
sequence is obtained by adding all shifts of S by integer multiples of N: 



¥>r, 



eez 



-IN, 



neZ. 
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The resulting sequence is 






1, for n = IN, I G Z; 
0, otherwise, 



neZ, or, 



o 



1 



N 



The set of N sequences generated from this ip by shifts in {0, 1, ..., N — 1} span 
the space of ./V-periodic sequences. 

Complex Exponential Sequences As we will see in Section 12.61 the complex ex- 
ponential sequences form a natural basis for iV-periodic sequences. These are the 
N sequences ipk, k G {0, 1, . . . , TV — 1}, given by 



1 



<Pk,: 



J(2Tr/N)kn 



feG{0,l,...,JV-l}, ne 



(2.26) 



Each of these sequences is periodic with period N. In Solved Exercise 12.14 we 
explore a few properties of complex exponential sequences. 

2.2.3 Multidimensional Sequences 

Two-Dimensional Sequences 

Today, one of the most widespread devices is the digital camera. In our notation, 
a digital picture is a two-dimensional sequence, x n>m . It can be seen either as an 
infinite-length sequence with a finite number of nonzero samples, 



n,m G 



(2.27) 



or as a sequence with domain n G {0, 1, . . . , N — 1}, m G {0, 1, . . . , M — 1}, conve- 
niently expressed as a matrix: 



2^0,0 



X\A 



aJjv-i,o xn-i,i 



xo,m-i 

X\,M-\ 



(2.28) 



While circularly extending the image at the borders is perhaps not natural (the 
top of the image appears next to the bottom), it is the extension that leads to 
the use of the discrete Fourier transform, as we will see later in the chapter. Each 
element x n ,m is called a pixel, and the total image has NM pixels. In reality, for 
x n ,m to represent a color image, it must have more than one component; often, red, 
green and blue components are used (RGB space) . Figure 2.31 gives examples of 
higher-dimensional sequences. 
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(a) sin(^ n ) sin(^m) (b) sin(^n + fm) (c) 



Figure 2.3: Multidimensional sequences, (a) Two-dimensional separable sinusoidal 
sequence, (b) Two-dimensional nonseparable sinusoidal sequence, (c) Earth visible above 
the lunar surface, taken by Apollo 8 crew member Bill Anders on December 24, 1968. This 
could be considered a two-dimensional sequence if the image were gray scale representing 
the intensity, or a higher-dimensional sequence depending on how color is represented. 



Sequence Space 



Symbol Finite Norm 



Absolutely-summable 



2 ) 



I^Hl — / J \Xn : m\ 
n,rn£Z 



Square-summable/finite-energy I (Z ) 



£ l*n,n.| : 



1/2 



Bounded 



e°°(Z^) Hxlloo = SUp \x n ,m\ 

n.meZ 



Table 2.2: Norms and two-dimensional sequence spaces. 



Sequence Spaces The spaces we introduced in one dimension generalize to multi- 
ple dimensions; for example, in two dimensions the inner product of two sequences 
x and y is 



\ x i U) — / j / j x n,m.y nt7 m 



(2.29) 



while the I 2 norm and the appropriate space £ 2 (Z, 2 ) are given in Table [2T2l together 
with other relevant norms and spaces. For example, a digital picture, having finite 
size and pixel values that are bounded, clearly belongs to all three spaces defined 
in Table 12.21 Infinite-length multidimensional sequences, on the other hand, can be 
harder to analyze. 

Example 2.8 (Norms of two-dimensional sequences) Consider the sequence 



2™ 3 1, 



n.m € 
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Figure 2.4: A discrete-time system. 



Its H 2 norm can be evaluated as 36 l 



ixx) = YT~= (t-)(t- 

nGNmGN ' VnfEN / VmfEN / 



4 9 
3 ' 8 



3 

2' 



yielding ||x|| = \/3/2. Similarly, ||x[[i = 3 and ||x||oo = 1. 



2.3 Systems 

Discrete-time systems are operators having discrete-time sequences as their inputs 
and outputs. Among all discrete-time systems, we will concentrate on those that are 
linear and shift-invariant. This subclass is both important in practice and amenable 
to easy analysis. The moving average filter in ( 12.5J ) is such a linear, shift-invariant 
system. After an introduction to difference equations, which are natural descriptions 
of discrete-time systems, we study linear, shift-invariant systems in detail. 



2.3.1 Discrete-Time Systems and Their Properties 

A discrete-time system is an operator T that maps an input sequence x £ V into 
an output sequence y G V, 

y = T(x), (2.30) 

as shown in Figure 2.41 As we have seen in the previous section, the sequence space 
V is typically ^? 2 (Z) or i°°(Z). At times, the input or the output is in a subspace of 
such spaces. 



Types of Systems 

Discrete-time systems can have a number of useful properties. We will encounter 
these same properties in Chapter \S\ as well. After defining key properties, we will 
illustrate them on certain basic systems, 
n 



36 We interchange summations freely, which can be done because each one-dimensional sequence 
involved is absolutely summable. When this is not the case, one has to be careful, as discussed in 
Appendix 1.A.2I 
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Linear Systems Similarly to Definition 11.171 linearity 37 ! combines two properties: 
additivity (the output of a sum of sequences is the sum of the outputs of the 
sequences) and scaling (the output of a scaled sequence is the scaled output of the 
sequence) . 



Definition 2.1 (Linear system) A 


discrete-time system 


T 


is 


called 


linear 


when, 


for any inputs 


x and y and any 
T(ax + /3y) = 


a, /?G C, 

= aT(x)+pT(y). 








(2.31) 



The function T is thus a linear operator, and we write ( J2.30D as 



y = Tx. 



(2.32) 



We will often use a matrix representation for a linear system, especially when the 
structure of the matrix reveals properties of the system. 

As discussed in Section 1.5.51 a linear operator has a unique matrix represen- 
tation once bases have been chosen for the domain and codomain of the operator. 
Throughout this chapter, matrix representations of linear systems will be with re- 
spect to the standard basis (the Kronecker delta sequence and its shifts) for both 
the inputs and outputs. The general form of the matrix representation then follows 
from ( 1 1.152) : column k holds the output that results from taking the shifted Kro- 
necker delta sequence S n -k as the input. To be more explicit, for each k G Z, let 
input x^ ' result in output y^', where 



,(*0 



n G 



Then the matrix representation of the system is 



v ( -\ l) 


(0) 

V-i 


V*l 


vt x) 


(0) 

Vo 


w 


it 1 * 


(o) 


w 



(2.33) 



Memoryless Systems Certain simple systems are instantaneous in that they act 
only based on the current input sample. It follows that if two inputs agree at a 
time index k, the corresponding outputs must also agree at time index k. For 
a mathematical representation of memory lessness, we use the domain restriction 
operator defined in ( 11.57) . 



37 In the engineering literature, linearity and superposition principle are often used interchange- 
ably. 
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Definition 2.2 (Memoryless system) A discrete-time system T is called 
memoryless when, for any integer k and inputs x and x' , 

l{fc}^ = ^{k}*' => hk}T(x) = l {k} T(x'). (2.34) 



In a matrix representation of a linear and memoryless system, the matrix will be 
diagonal; we will illustrate this and other properties of matrix representations of 
linear systems in several examples shortly. 

Causal Systems The output of a causal system at time index k depends on the 
input only up to time index k. It follows that if two inputs agree up to time k, the 
corresponding outputs must agree up to time k. 



Definition 2.3 (Causal system) A discrete-time system T is called causal 
when, for any integer k and inputs x and x' ', 

l{-oo,...,k}X = l{-oo,...,fc} x' => l{_ 0O) ... ife }T(a;) = l{_ 00) ..., fc }T(a; / )- (2.35) 



In a matrix representation of a linear and causal system, the matrix will be lower 
triangular. 

Since a computation cannot depend on inputs that will only be provided in 
the future, causality can seem to be a property that is required of any implemented 
system. However, this view takes the concept of the time index representing time 
too literally. First, the discrete time index may represent something else entirely, 
like a physical location along a line; the data can then be processed in any order. 
Second, when the time index indeed represents time, the time origins of the input 
and output need not coincide; then, causality sometimes represents nothing more 
than a convenient convention for aligning time indexes of the input and output. 

Shift-Invariant Systems In a shift-invariant system, shifting the input has the 
effect of shifting the output by the same amount: 

Definition 2.4 (Shift-invariant system) A discrete-time system T is called 
shift invariant when, for any integer k and input x, 

y = T(x) => y' = T(x'), where x' n = x n _ k and y' n = y n - k . (2.36) 



In a matrix representation of a linear and shift-invariant system, the matrix will be 
Toeplitz. 

Shift invariance (or, when it corresponds to time, time invariance) is often a 
desirable property. For example, an MP3 player should produce the same music 
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from the same file on Tuesday as on Monday. Moreover, linear shift-invariant 
(LSI) or linear time-invariant (LTI) systems have desirable mathematical properties. 
Much of the remainder of this section and Sections 12.41 and 12.51 are devoted to the 
powerful analysis techniques that apply to LSI systems. Sections 12. 61 and 12. 71 include 
variations on shift invariance and the corresponding techniques. 

Stable Systems A critical property for a discrete-time system is its stability. 
While various definitions exist, they all require that the system remain well behaved 
when presented with a certain class of inputs. We define bounded-input bounded- 
output (BIBO) stability here, because it is both practical and easy to check in cases 
of interest. 




In a matrix representation of a linear and BIBO-stable system, every row of the 
matrix will be absolutely summable. The corresponding result for LSI systems is 
developed fully in Section 12.3.31 

The definition of BIBO stability involves the £°° norm, so we can see imme- 
diately that a system that is linear and BIBO stable is a bounded linear operator 
from ^°°(Z) to £°°(Z). The absolute-summability condition on the system that en- 
sures BIBO stability also ensures that the system is a bounded linear operator from 
£ 2 (Z) to £ 2 (Z) (see Exercise TBD). Thus, when we limit attention to BIBO stable 
systems, we are able to use the various results for bounded linear operators on a 
Hilbert space that were developed in Chapter H 

Basic Systems 

We now discuss a few basic discrete-time systems. These include some basic building 
blocks that we will use frequently. Their properties are summarized in Table 12.31 



Shift The shift-by-1, or delay, operator is defined as: 
Vn = X n -i, n e Z, or, 



- _ - 








y-i 

Vo 

2/i 


= 


X-2 


= 


X-i 
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- • - 











1 
1 



X-l 



■''o 



Xl 



(2.38a) 



(2.38b) 
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It is an LSI operator, causal and BIBO stable, but not memoryless; the matrix is 
Toeplitz, with a single nonzero off diagonal. A shift by k, k > 0, is obtained by 
applying the delay operator k times. 

The advance-by-1 operator, which maps x n into X n +i, is the inverse of the 
shift-by- 1 operator (238) : 



Vn 



t-TJ+1) 



n £ Z, 



or, 



- _ - 








y-i 




x 




2/o 
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\Xl | 


= 


2/1 




X2 




- ■ - 













X-l 



,r 



Xl 



(2.39a) 



(2.39b) 



It is an LSI operator, and BIBO stable, but neither memoryless nor causal; the 
matrix is Toeplitz and upper triangular with a single nonzero off diagonal. While it 
is obvious that the matrix in ( |2.39bj ) is the transpose of the one in (2.38b) , it is also 
true that these matrices are inverses of each other. (Any finite-sized truncation of 
the matrix in ( |2.38b| ) or ( 12. 39b) , centered at the origin, is not invertible.) 



Modulator Consider pointwise multiplication of a sequence x n by (—1)™: 

{-l) n x n = { I"' tlZ^? "€2, or, (2.40a) 



Vn 















2/-1 
2/0 
2/1 


= 




-X- 


1 


= 




Xq 






-Xl 

















-10 

o 

0-1 



X-l 



Xq 



Xl 



(2.40b) 



This is the simplest example of modulation, that is, change of frequency 38 } of the 
sequence. For example, a constant sequence x n = 1 turns into a fast-varying (high- 
frequency) sequence y n = (—1)™: 



[... 110 11 



1 



-1 



-1 1 



This operator is linear, causal, memoryless and BIBO stable, but not shift invariant; 
the matrix is diagonal. 

A more general version of ( 12. 40a) would involve a sequence a n multiplying 



38 While we have not defined the notion of frequency yet, you may think of it as a rate of variation 
in a sequence; the more the sequence varies in a given interval, the higher the frequency. 
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the input: 



y n = a n x n , n E Z, or, 

























y-i 
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a>-iX- 
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■■■ a_i 







•■• 

•■• 

ai ■•• 




X-l 


2/o 




a Q x 
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2/i 




a\X\ 







Xi 

























(2.41a) 



(2.41b) 



Like ( 12.40) . it is linear, causal, memoryless and BIBO stable, but not shift invariant; 
the matrix is again diagonal. 



Accumulator The output of the accumulator is akin to the integral of the input: 



Vn = 5Z Xk > neZ, or, 

k — ~ OO 



y-\ 

Vo 
Vi 


= 



1 
1 
1 1 1 



X-l 



x 



Xl 



(2.42a) 



(2.42b) 



This is an LSI, causal operator, but not memoryless nor BIBO stable; the matrix 
is Toeplitz and lower triangular. 

If the input signal is restricted to be for n < 0, ( 12.42) reduces to 
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=0 


2/o 




[0 ° ° 


2/1 




1 1 


2/2 




1 1 1 









Xk, n e N, or, 



.T (] 



■1-2 



(2.43a) 



(2.43b) 



This is an LSI, causal operator, but not memoryless nor BIBO stable; the matrix 
is Toeplitz and lower triangular. 

Weighting by dividing ( 12. 43a) by the number of terms involved turns the 
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accumulator into a running average: 

Vn = — rrX! 

71+1 t—' 



</o 



Vi 

y-2 



I? ? 



Xk, 

fc=0 





neN, 



or, 



, , 

ill 

3 3 3 



Xi 
Xi 



(2.44a) 



(2.44b) 



This is a linear operator, causal and BIBO stable, but not shift invariant nor mem- 
ory less; the matrix is lower triangular. 

Other "weight functions are possible, such as a decaying exponential weighting 
of the entries with factor a € (0, 1): 



(2.45a) 



Vn = X^ a " kxk > n 
k=0 


eN, 


2/o 
2/i 

2/2 


= 


a 1 ••• 

a 2 a 1 • • • 




x 

Xi 
X2 



(2.45b) 



This is an LSI, causal operator, but not memoryless; the matrix is Toeplitz and 
lower triangular. It is BIBO stable because \a\ < 1. 

Averaging Operators Consider a system that averages neighboring values, for 
example, 



~(x„-l +x n + x n+1 ), ne 



2/-1 
2/o 

2/1 


1 
3 



1 1 

1 1 

1 1 



X-i 



X(, 



Xl 



(2.46a) 



(2.46b) 



As we have seen in Example 12.21 this is a moving average filter with TV = 3. It 
is called moving average since we look at the sequence through a window of size 
3, compute the average value, and then move the window to compute the next 
average. This operator is LSI and BIBO stable, but neither memoryless nor causal; 
the matrix is Toeplitz. 
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We could obtain a causal version by simply delaying the moving average by 
(N — l)/2 samples in ( 12.51 ). For N = 3 as here, this results in 



1 



1 o o 
1 
1 1 1 
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Vo 
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X-i 
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Xl 









(2.47a) 



(2.47b) 



a delayed-by-1 version of (2.46) . This operator is again LSI and BIBO stable but 
also causal, while still not memoryless; the matrix is Toeplitz and lower triangular. 
An alternative is a block average, 



1 , 

Vn = -(£3n-l + XZn + ^3n+l), 



V-l 



.. 
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.. 1 


m 


1 
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.. 
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1 . 



X_i 



.?•(] 



Xl 

x% 

X3 

■I'-k 



(2.48a) 



(2.48b) 



It is easy to see that ( 12.48a) is simply (2.46a) evaluated at multiples of 3. Similarly, 
the matrix in (2.48b| ) contains only every third row of the one in (2.46b) . This is a 
linear and BIBO stable operator; it is neither shift invariant, nor memoryless, nor 
causal; the matrix is block diagonal. 

A nonlinear version of the averaging operator could be 



Vn 



median ( [x n -i x n x n +x\) 



(2.49) 



Instead of the average of the three terms, this operator takes the median value. This 
operator is shift invariant and BIBO stable but clearly neither linear, nor causal, 
nor memoryless. 

Maximum Operator This simple operator computes the maximum value of the 
input up to the current time: 



V-n 



([■ 



%n— 2 %n— 1 &', 



J) 



(2.50) 



This operator is clearly neither linear nor memoryless, but it is causal, shift invari- 
ant, and BIBO stable. 
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Def. [23] 


Def. [231 


Def. [23 


Def. [O] 


Def. [231 


Shift (delay) 


(238) 


/ 


/ 


/ 


X 


/ 


advance 


(239) 


^ 


^ 


X 


X 


/ 


Modulator 


(230) 


S 


X 


^ 


• 


/ 


general 


(231) 


■f 


X 


/ 


• 


/ 


Accumulator 


(232) 


/ 


• 


/ 


X 


X 


restr. input 


(243) 


s 


•/ 


/ 


X 


X 


weighted 


(234) 


■f 


X 


/ 


X 


/ 


exp. weighted 


(235) 


s 


X 


s 


X 


• (|q|<1) 


Averaging oper. 


(236) 


s 


y 


X 


X 


/ 


causal 


(237) 


s 


y 


/ 


X 


/ 


block 


(235) 


s 


X 


X 


X 


•/ 


median 


(239) 


X 


■/ 


X 


X 


S 


Maximum oper. 


((230) 


X 


■/ 


/ 


X 


/ 


Matrix representation 


/ 


Toeplitz 


Lower 


Diagonal 


Rows 
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Table 2.3: Basic discrete-time systems and their properties. Matrix representation 
assumes linearity. 



2.3.2 Difference Equations 

An important class of discrete-time systems can be described by linear difference 
equations that relate the input sequence and past outputs to the current output, 



J2 b^xn-k - J2 °4"V 

fcez fe=i 



(2.51) 



If we require shift invariance, then the coefficients a^ and b), are constant (do 
not depend on n), and we get a linear, constant- coefficient difference equation, 



Vn 



/ ^ Vk%n—k 



A el 



oo 

E 
fc=i 



flfej/n-fc- 



(2.52) 



Such an equation does not determine whether a system is causal. However, ( 12.52) 
is suggestive of a recursive computation of the output, forward in time (increasing 
n); we will concentrate on such solutions. To make the system causal, we restrict 
the dependence on x to the current and past values, leading to 



E* 



k%r, 



^r 



fc=0 fe=l 



0-kVn 



(2.53) 
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Realizable systems will have only a finite number of nonzero coefficients a k , k G 
{1,2,..., N} and b k , k G {0, 1, . . . , M}, reducing (2j)3) to 

M N 

Vn = ^b k x n ^ k -^a k y n ^ k . (2-54) 

fe=0 k=l 



We discuss finding solutions to such difference equations in Appendix I2.A.21 

Example 2.9 (Difference equation of the accumulator) As an exam- 
ple, consider the accumulator seen in (2.42a) : 

n n— 1 

Vn = 5Z °° k = Xn + X] Xk = X n+Vn-1, (2.55) 

k— — co k— — oo 



which is of the form (2.54) , with bo = 1, a\ = —1. The infinite sum has been 
turned into a recursive formula (2.55) , showing also how one could implement 
the accumulator: to obtain the current output y n , add the current input x n to 
the previously saved output y n -\. 

Let us take x n = 5 n , and see what the accumulator does. Assume we are 
given y_i = (3. Then, for n > 0, 

y = x Q + y_i = 1 + f3, yi = x x + y = 1 + /3, . . . , y n = 1 + /3, ... 

Thus, the accumulator does exactly what it is supposed to do: at time n = 0, it 
adds the value of the input xq = 1 to the previously saved output y-\ = j3, and 
then stays constant as the input for all n > is zero. For n < 0, we can solve 
( 12.55) by expressing y n -\ = Vn — %n'i it is easy to see that y n = (3, for all n < 0. 
Together, the expressions for n > and n < 0, lead to: 

y n = f3 + u n , (2.56) 

that is, the initial value before the input is applied plus the input from the 
moment it is applied on. 

From the above example, we see that unless the initial conditions are zero, the 
system is not linear, that is, one could have a zero input producing a nonzero output. 
Similarly, the system is shift invariant only if the initial conditions are zero. These 
properties are fundamental and hold beyond the case of the accumulator: difference 
equations as in ( 12.54) are linear and shift invariant if and only if initial conditions 
are zero. This also means that the homogeneous solution is necessarily zero (see 
Exercise [23]). 

2.3.3 Linear Shift-Invariant Systems 

Impulse Response 

A linear operator is specified by its outputs in response to each element of a basis 
for its domain space (see Section [1.5.5) . As we saw in ( 12.33) , this allows a matrix 
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representation of a linear discrete-time system to be formed from the output se- 
quences in response to the Kronecker delta sequence and its shifts as inputs. When, 
in addition, the system is shift invariant, to satisfy ( 12.36J ) all these output sequences 
are themselves related by shifting. Thus, the system is specified completely by the 
output sequence resulting from the Kronecker delta sequence as the input. 



Definition 2.6 (Impulse response) A sequence h is called the impulse re- 
sponse of LSI discrete-time system T when input 5 produces output h. 



The impulse response ft of a causal linear system always satisfies h n = for all 
n < 0. This is required because, according to ( 12.351) , the output in response to 
input 6 must match on {—00, ..., —2, —1} to the output sequence that results 
from the input sequence. 

Example 2.10 (Impulse response from a difference equation) The lin- 
ear, constant-coefficient difference equation ( 12.53) with zero initial conditions 
represents an LSI system. An impulse response of the system is an output that 
results from Kronecker delta input, x = S. Thus, an impulse response satisfies 

00 00 00 

h n = '^2b k 5 n - k -'^2 a khn~k = K-'^cikhn-k, (2-57) 

fc=0 k=\ k=\ 



where (a) follows from (2.53) ; and (b) from the sifting property of the Kronecker 
delta sequence (see Table [270 . 

If we restrict our attention to causal systems, then the LCCDE uniquely 
specifies the system. The impulse response satisfies h n = for all n < 0, and h n 
can be computed for all n > by using (2.57) recursively for n = 0, 1, .... 

Convolution 

The impulse response and its shifts form the columns of the matrix representation 
of an LSI system, as in (2.33) . Expressing this as a summation is instructive and 
introduces the key concept of convolution. 

Since a general input x to LSI system T can be written as x„ = Sfcez x k^n-k, 
we can express the output as 

y = Tx = T^x k S n ^ k = ^x k TS n ^ k = ^x k h n ^ k = h * x, (2.58) 

fegz fcez fcez 

where (a) follows from linearity; and (b) from shift invariance and the definition of 
the impulse response, defining the convolution | 39 | 



39 Convolution is sometimes called linear convolution to distinguish it from the circular convo- 
lution of finite-length sequences as in Definition 2.91 
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Definition 2.7 (Convolution) 


The convolution 


between sequences 


h and 


x is 


defined as 


















(Hx) n = 


= (h*x) n 


— / j Xk^n—k 

feGZ 


feez 


-khk, 


(2 


.59) 


where H is 


called the convolution 


operator associated with h 









When not clear from the context, we will use a subscript on the convolution operator 
*„ to denote the argument over which we perform the convolution (for example, 

Xji—m *n <^£—n — Z—tk ^k—m^i~n-\-k) • 

Example 2.11 (Solution to the LSI difference equation of the accumulator) 
Let us go back to the difference equation of the accumulator ( 12.55) , and assume 
zero initial conditions, y_i = 0. According to ( 12.571 ), the impulse response for 
this system (recall that &o = 1 and Oq = —1), that is, a response to x n = S n , is 



h n = [... |T| 1 1 1 
Similarly, the response of the system to x n = <5„_fc is h n -k- 

Hn—k — 



(I 







111 



(2.60a) 



(2.60b) 



By linearity, for a general input x, we can now use these to get the output using 
the convolution as in (12.591): 



(a) 
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+ 


Xi 


+ ... 





fcez 


















1 




x 




X\ 




%n 



















•'•0 



Xq + X\ 

En 
k=0 Xk 



Xoh n X\hr, 



x k h„ 



where (a) follows both from the convolution expression ( 12.59) , and (2.60) . The 
above result thus performs exactly the accumulating function. 

Properties The convolution (2.59) satisfies: 
(i) Connection to the inner product 

(h*x) n = ^x k h n ^ k = (x kl h* n _ k ) k . (2.61a) 

feez 
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h*x = x * h. (2.61b) 



(ii) Commutativity 

(iii) Associativity 

g * (h * x) = g * h * x = (g * h) * x. (2.61c) 

(iv) Deterministic autocorrelation 

a n = y, Xk x%_ n = x n * n x*_ n . (2.61d) 

fcez 

All properties of convolution above depend on the sums — whether written explic- 
itly or implicitly — converging. Convergence of the convolution is discussed in Ap- 
pendix I2.A.31 The following example illustrates the apparent failure of the associa- 
tive property when a convolution sum does not converge. 

Example 2.12 (When convolution is not associative) Since convolutions 
may fail to converge, one needs to be careful about associativity. For g, choose 
the Heaviside sequence from (2.10) , g n = u n , for h the first-order differencing 
sequence, h n = S n — <5„_i, and for x the constant sequence, x n = I. Now, 

g * (h * x) = u*0 = 0, while (g * h) * x = 5*1 = 1, 

where (a) follows because convolving a constant with the differencing operator 
yields a zero sequence; and (b) because convolving a Heaviside sequence with the 
differencing operator yields a Kronecker delta sequence. This failure of associa- 
tivity occurs because g*h*x is not well defined; for it to be well defined requires 
absolute convergence of 

/ / Qn—m ^m—k Xk 

™ez feez 
for every n 6 Z, which does not hold. 

Filters The impulse response is often called a filter and the convolution is called 
filtering. Here are some basic classes of filters: 

(i) Causal filters are such that h n = for all n < 0. 
(ii) Anticausal filters are such that h n = for all n > 0. 
(iii) Two-sided filters are neither causal nor anticausal. 
(iv) Finite impulse response (FIR) filters have only a finite number of coefficients 

h n different from zero. 
(v) Infinite impulse response (IIR) filters have an infinite number of nonzero 
terms. 



For example, the impulse response in Example 12.11 is causal and IIR. 
Stability We now discuss stability of LSI systems. 
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Proposition 2.8 An LSI system is BIBO stable if and only if its impulse re- 
sponse is absolutely summable. 



Proof. To prove sufficiency (absolute summability implies BIBO stability), consider an 
absolutely-summable impulse response h £ £ (Z), so \\h\\i < 00, and a bounded input 
x £ ^°°(Z), so II^Hoo < 00. The absolute value of any one sample at the output can be 
bounded as follows: 



1 1 W 

\Vn\ = 



2J h k x„ 



l;C£ 



W ^ . . ( c ) 



/, \hk\ \x n -k\ < Halloo y^|fefc| = ||a;||oo||/i||i < 00, 

fceZ fceZ 



where (a) follows from (2.59) ; (b) from the triangle inequality (Definition 11. 9( iii)[); (c) 
from bounding each |a;„_J by ||ar|| 00 ; and (d) from the definition of the £ norm. This 

proves that y is bounded] | 

We prove necessity (BIBO stability implies absolute summability) by contradic- 
tion. For any h that is not absolutely summable we choose a particular input x (which 

depends on h) to create an unbounded output. Consider a real impulse response; | h, 

and define the input sequence to be 

( -1, for t <0; 

X n = sgn(h- n ), where sgn(t) = < 0, for t — 0; 

I 1, for t > 0, 



is the sign function. Now, compute the convolution of x with h at n = 0: 

2/0 = J2 hkx - k = J2\ hk \ = n^ii 1 ' ( 2 - 62 ) 

which is unbounded when h is not in I 1 (Z) . 

The impulse response of the accumulator, for example, does not belong to ^(Z); 
a bounded input to the accumulator can lead to an unbounded output. Limiting 
attention to filters in .^(Z) avoids technical difficulties with both convergence of 
the convolution sum as well as the resulting sequence being in a suitable sequence 
space. When h G £ X (Z) and x G ^ P (Z) for any p G [l,oo], the result of h * x is in 
as well; see Solved Exercise 12.21 



in 



This boundedness is equivalent to the convergence of the convolution sum as discussed in 
Appendix I2.A.3I 

41 For a complex- valued impulse response, a slight modification, using x„ = ft.*/ \h n \ for \h n \ ^ 0, 
and x n = otherwise, leads to the same result. 
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(a) Sequence. 
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(b) Impulse response. 
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(d) 



(e) 



(c) Convolution. 

h-n+l 



I , 



(0 



Figure 2.5: Example of the convolution between a sequence and a filter, (a) Sequence 
x n . (b) Impulse response h n . (c) Result of convolution y n . (d) Time-reversed version of 
the impulse response, h- n . (e)-(f) Two time-reversed and shifted versions of the impulse 
response involved in computing the convolution. 



Matrix View As we have shown in Section 1.5.51 any linear operator can be ex- 
pressed in matrix form. We may visualize ( 12.591 ) as: 



V-2 
V-i 



</o 



2/1 

1)2 



Hq h- 



h. 



h. 



h- 



h\ h h-i h- 2 h-3 

h 2 hi ho h-i h-2 

h 3 h 2 hi h h-\ 

hi h 3 hi hi h 







X-2 
X-1 




Xq 




X\ 
X2 



Hx. (2.63) 



H 

This again shows that an LSI discrete-time system, linear operator (on sequences), 
filter and (doubly-infinite) matrix are all synonyms. The key elements in ( ]2.63[ ) are 
the time reversal of the impulse response (in each row of the matrix, the impulse 
response goes from right to left), and the Toeplitz structure of the matrix (each 
row is a shifted version of the previous row, the matrix is constant along diago- 
nals, see ( 11.228) ). In Figure [2751 an example convolution is computed graphically, 
emphasizing time reversal. 

We can easily find the matrix form of the adjoint of the convolution operator, 
as the unique H* satisfying ( ] 1.441 ): 



(Hx, y) = (x, H*y). 



(2.64) 
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The result is the convolution operator associated with h*_ n , the time-reversed and 
conjugated version of h n : 



H* 



h*_x K 
hl 2 hu 



h* 2 h% hi 
K h* 2 h% 



h* h\ 



/il 3 h*_ 2 h*_ x /iq h\ 
h*_ A h*_ 3 h*_ 2 h*_! ha 



(2.65) 



Circular Convolution 

We now consider what happens with our second class of sequences, those that are 
of finite length circularly extended. To start, we assume that the impulse response 
h is in^(Z). 



Linear Convolution with Circularly- Extended Signal Given a sequence x with 
circular extension as in ( 12.25) and a filter h in £ (Z) , we can compute the convolution 
as usual: 



y n = (h * x) n 






•Ek'l'n- 



kez 



flk^n—k- 



(2.66) 



Since x is TV-periodic, y is iV-periodic as well: 



Vn+N = /^^kXn+N-k = / .hkXn-k = Vn, 



ke- 



kt- 



where (a) follows from the periodicity of x. 

Let us now define a periodized version of h, with period 7V, as: 



/»JV,n = / 4 hn-kN, 

feGZ 



(2.67) 



which converges for every n because h G £ X (Z). We now want to show how we can 
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express the convolution ( ]2.66ji 42 l in terms of what we define as a circular convolution: 

(£+l)N-l 
(h*x) n = ^h k X n -k = X^ 5Z h kXn-k 

fcez fez k=£N 

JV-l jv-i 

= 2_j 2-J ^k'+lN Xn-k'-lN = 2^ /_^ hk+£N %n-k 
£GZ fc'=0 teZ fc=0 

JV-l JV-l 

= 2^1 Z_^l hk+lN %n-k = 2^j ^N,k %n-k 
k=0 t£Z k=0 
„ ' 

JV-l 



k=0 



\l flN,k Z(n-k) mod JV = ( h N © x)n, (2. 



where in (a) we split the set of integers into length- N segments; (b) follows from 
change of variable k' = k — £N; (c) follows from periodicity of x and change of 
variable k = k'\ in (d) we were allowed to exchange order of summation because h G 
£ 1 (Z) (see Appendix ll.A.3] ); and (e) follows from periodicity of x. The expression 
above tends to be more convenient as it involves only one period of both x and 
the periodized version hn of the impulse response h. The relationships between 
the convolution of a finite- length sequence x circularly extended, and h, and the 
circular convolution of the same sequence x, and Un are shown in Figure 2.61 and 
explored further in the context of circulant matrices in Solved Exercise 2.51 

Definition of the Circular Convolution Above in ( J2.68J ), we implicitly defined 
a new form of convolution between a length- N input sequence x and a length- N 
impulse response h: 



Definition 2.9 (Circular convolution) The 


circular 


convolution 


between 


length- N 


sequences h and 


x is defined as 

JV-l 




JV-l 






{Hx) n 


, = (hmx) n = 


/ jXk fyn-fc) mod JV = 


S x (" 


-k) mod JV ilk: 


(2.69) 






fc=0 




fe=0 






where H 


is called the circular convolution 


operator 


■ associated with h. 





The result of the circular convolution is a length- N sequence. While this notion 
of convolution is independent from that of linear convolution, we have just seen 
that the two are related when the input sequence is of finite length (circularly 
extended) but the impulse response of the system is not. We made the connection 
by periodizing that impulse response. 



42 When there is more than one form of convolution involved, we term the one in ( [2.66| ) linear 
convolution. 
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6 
4 
2 


Xn 
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2 4 6 8 10 12 14 

(a) (b) 



6 

4 
2 


Vn 




















































2 4 6 8 10 12 14 



(c) 



2 4 6 8 10 12 14 

(d) 



Figure 2.6: Convolution of (a) finite-length sequence x, circularly extended (periodic 
sequence of period N = 4). (b) The filter h, (c) convolved with the sequence x, leads to a 
finite-length output y, circularly extended, (d) The equivalent, periodized filter Hn = h&, 
circularly convolved with x, leads to the same output as in (c). 



Equivalence of Circular and Linear Convolutions While we have seen that linear 
and circular convolutions are related, there are instances when the two are equiva- 
lent. Assume we have a length-M input x and a length-L impulse response h: 



h 











•'Co 



3'1 



hi 



XM-l 



h 



L-l 







(2.70) 



The result of the linear convolution ( ]2.59[ ) has at most L + M — 1 nonzero samples: 



// 







.'/() 



IJl 



Ul+m-1 



While we have chosen to write the sequences as infinite-length vectors, we could 
have also chosen to write each as a finite-length vector with appropriate length; 
however, as these lengths are all different, we would have had to choose a common 
vector length N. Choosing this common length is exactly the crucial point in when 
the linear and circular convolutions are equivalent, as we show next. 



Proposition 2.10 (Equivalence of circular and linear convolutions) 
Linear and circular convolutions between a length-M sequence x and a length-L 
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sequence h are equivalent when the period of the circular convolution N satisfies 

N > L + M-l. (2.71) 



Proof. Take x and h as in (|2.70j) . The linear and circular convolutions, i/ 1 " 1 ' and j/ clrc ' , 
are given by (2.59) and ( 12.691) , respectively: 



y„ 



CUn) 



= (hoXn + • • • + flnXo) + (hn+lX-1 + ■ ■ ■ + flL-lX-L+1+r 



(2.72a) 



Vn lrC ' = (hoXn + ■ ■ ■ + h n Xo) + (hn+lXN-1 + ■ ■ ■ + h,L-lXN-L+l+n) , (2.72b) 

for n e {0, 1, . . . , N — 1}. In the above, we broke each convolution sum into positive 
indices of x n and the rest (negative ones for the linear convolution, and mod iV for the 
circular convolution). Note that in (2,72b) the index goes from to (JV — 1), but stops 
at (L — 1) since h is zero after that. 

Since x has no nonzero values for negative values of n, the second sum in (2.72a) is 
zero, and so must the second sum in ( 12. 72b) be (and that for every n — 0, 1, . . . , iV — 1), 
if (12.72a) and (2.72b) are to be equal. This, in turn, is possible only if xn-l+i+ti 
(the last x term in the second sum of the circular convolution) has an index that is 
outside the range of nonzero values of x, that is, if N — L + 1 + n > M , for every 
n — 0,1, . . . ,N — 1. As this is true for n = by assumption (2.71) . it will be true for 
all larger n as well. 

Figure [2771 depicts this equivalence and Example 12.131 examines it in matrix notation 
for M = 4, L = 3. 



Matrix View As we have done for linear convolution in ( 12.63) , we visualize circular 
convolution ( J2.69) using matrices: 



Vo 

2/2 
VN-X 



ho Hn-X flN-2 

hi ho /ijv-i 

hi hi ho 

hlsr-X h.N-2 flN-3 



hi] 




x 


h 2 




Xl 


h 3 




X2 


ho_ 




XJV_1_ 



Hx. 



(2.73) 



H 



H is a circulant matrix as in ( jl.227) with h as its first column, and it represents 
the circular convolution operator when both the sequence x and impulse response 
h are finite; when the impulse response is not finite, the elements of H would be 
samples of the periodized impulse response h^. 

Example 2.13 (Equivalence of circular and linear convolutions) We 
now look at a length-3 filter convolved with a length-4 sequence. The result of 
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%n 





























h n 



01234 5 012345 

(a) Sequence. (b) Filter. 



Vn — (ft © x) n ¥" (h * x)t. 



15 
10 
5 


Vn = 


(ft® 


x)„ = 


= (ft* 


x)n 




























1 



1234 5 012345 

(c) Linear convolution. (d) Circular convolution. 



Figure 2.7: Equivalence of circular and linear convolutions, (a) Sequence x of length 
M — 4. (b) Filter ft of length L — 3. (c) Linear convolution results in a sequence of length 
L + M — 1 = 6, the same as a circular convolution with a period N > L + M — 1, N — 6 in 
this case, (d) However, circular convolution with a smaller period, N = 5, does not lead 
to the same result. 



the linear convolution is of length 6: 
















2/o 






2/1 






2/2 
2/3 


= 




2/4 






2/5 















ft 























hi 


fto 




















h 2 


hi 


fto 




















h 2 


hi 


ha 




















h 2 


hi 


h 




















h 2 


in 


h 




















h 2 


hi 


h 




















h 2 


hi 


h 














Xq 




Xl 




X2 




X3 






















(2.74a) 



To calculate circular convolution, we choose N = M + L — 1 = 6, and form a 
6x6 circulant matrix H as in ( 12.73) by using ft as its first column. Then the 
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circular convolution leads to the same result as before: 



2/o 




h 











h 2 


hi 




x 


Vi 




hi 


h 











h 2 




Xl 


2/2 




h 2 


hi 


ho 













x 2 


2/3 







h 2 


hi 


h 










X3 


2/4 










h 2 


hi 


ho 










2/5 













h 2 


hi 


ho 








(2.74b) 



Had the period N been chosen smaller (for example, N = 5), the equivalence 
would have not held. 

This example also shows that to compute the linear convolution, we can compute 
the circular convolution instead by choosing the appropriate period N > M + L— 1. 
This is often done as the circular convolution can be computed using the discrete 
Fourier transform of size N (see Section [2.9. 21 ), and fast algorithms for the discrete 
Fourier transform abound (see Section 12.9.1,1 ) . 

2.4 Discrete- Time Fourier Transform 

In this and the next two sections, we introduce various ways to analyze sequences 
and discrete-time systems. They range from the analytical to the computational 
and are all variations of the Fourier transform. Why this prominent role of Fourier 
methods? Simply because they are based on eigensequences of LSI systems (con- 
volution operators). Thus far, we have seen two convolution operators (linear and 
circular). We will see that these have different sets of eigensequences, which lead 
to different Fourier transforms for sequences. The eigensequence property leads to 
the diagonalization of the convolution operator, which then implies the convolution 
property — an equivalence between convolving sequences and multiplying Fourier 
transforms of the sequences. 

In this section, we introduce the discrete-time Fourier transform (DTFT) — 
the Fourier transform for infinite-length discrete-time sequences. It is a 27r-periodic 
function of frequency uu that we write as X(e JU ), with e' JU) clearly in C and of 
modulus 1, both to stress periodicity as well as to create a unified notation for the 
DTFT and the z-transform X(z), which we discuss in Section [2.51 The z-transform 
has argument z£C that may have modulus different from 1. In Section [2.61 we 
focus on the discrete Fourier transform (DFT) — the Fourier transform for both 
infinite-length periodic sequences as well as length- N sequences circularly extended 
(both of these can be viewed as existing on a discrete circle of length N). The DFT 
is an iV-dimensional vector we write as X^. 

2.4.1 Definition of the DTFT 

Eigensequences of the Convolution Operator We start with a fundamental prop- 
erty of LSI systems: they all have all unit-modulus complex exponential sequences 
as eigensequences. This follows from the convolution representation of LSI systems 
( 12.59) and a simple computation. 
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Consider a complex exponential sequence 

v n = e jwn , neZ, (2.75) 

where uj is any real number. The quantity u> is called angular frequency; it is 
measured in radians per second. With u> = 2nf, the quantity / is called frequency; 
it is is measured in Hertz, or the number of cycles per second. The sequence v is 
bounded since \v n \ = 1 for all n£Z. If the impulse response h is in £ (Z), according 
to Proposition 12.81 the output h * v is bounded as well. Along with being bounded, 
h * v takes a particular form: 

(Hv) n --= {h*v) n = Y, v "-khk = Y,e j " {n ~ k) h k 

feez 

(2.76) 




This shows that applying the convolution operator H to the complex exponential 
sequence v gives a scalar multiple of v. In other words, v is an eigensequence of H 
with corresponding eigenvalue A u that we call the frequency response of the system 
H(e JUJ ); it is defined formally in ( |2.105a| ). We can thus rewrite ( 12.761 ) as 

He jun = h*e jujn = H{e^)e jujn . (2.77) 

The scalar multiples of v form a subspace S^ = {ae Jujn \ a G C}. This space is 
invariant under the operation of convolution: when the input is in 5 W , the output 
is in Su as well. 

DTFT Finding the appropriate Fourier transform now amounts to projecting onto 
each of the invariant subspaces S^ . 



Definition 2.11 (Discrete-time Fourier transform) The discrete-time 
Fourier transform of a sequence x is 



X{e 3UJ ) = ^2x n e~ jujn , uj£R. (2.78a) 

nSZ 

It exists when (2.78a) converges for all w 6 B; we then call it the spectrum of x. 
The inverse DTFT of a 27r-periodic function X(e JW ) is 

x n = — I X{e j ")e^ n duj, neZ. (2.78b) 

When the DTFT exists, we denote the DTFT pair as 

DTFT v , j^-. 

x n < — > X(e J ). 
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Since e~ 3un is a 27r-periodic function of u> for every n € Z, the DTFT is always 
a 27r-periodic function, which is emphasized by the notation X(e JU ). Note that 
the sum in ( |2.78aj ) is formally equivalent to an £ 2 (Z) inner product, although the 
sequence e 3wn has no decay and is thus not in ^ 2 (Z). We now discuss limitations 
on the inputs and the corresponding types of convergence. 

2.4.2 Existence and Convergence of the DTFT 

The existence of the DTFT depends on the sequence x. When a doubly-infinite 
series as in ( 12. 78a) is given without a specification of how to interpret it as a 
limiting process, one must consider the series well defined only when it converges 
absolutely (see Appendix |1.A.2 [). This immediately implies existence of the DTFT 
for all sequences in I (Z). To extend beyond ^ 1 (Z), we consider the limit as N — > 00 
of the partial sums 

N 

X N (en = Y. ^e~ jwn . (2.79) 

n=-N 

Convergence of the partial sums under the £ 2 ([— 7r,7r)) norm allows us to consider 
the DTFT to exist for all sequences in £ 2 (Z). The DTFT can be a useful tool even 
when ( 12. 78a) diverges to 00 for some values of u>; this requires more caution. 

Sequences in € 1 (Z) If x S ^ 1 (Z), then ( 12. 78a) converges absolutely for every u>, 
since 

J2\x n ei" n \ = ^| x „|| e ^«| = Hall! < «,. 

nGZ neZ 

This tells us that the DTFT of x exists. Moreover, as a consequence of absolute 
convergence for all u>, the limit X{eP u ) is a continuous function of cue 3 } 

Since the DTFT itself is well defined, we can verify the inversion formula by 
substituting f |2.78aj) into ( ]2.78bj ). First, 



^f \Y,Xke- juk ) e jun du ^ E Xfe i /f ''""' "'"' / -'''- l2 " SOal 



(a) ^ 1 

kez 



where in (a) we are allowed to exchange the order of summation and integration 
because x G f{Z) (see Section flTO] ) . The integral f*^ e 3UJ< - n ~ k ^ du> must be treated 
separately for n = k and n 7^ k. Each case gives an elementary computation, and 
the result is 

e Mn-k) duj = 27T S n - k (2.80b) 



43 Absolute convergence of ( [2.78a} implies uniform convergence of the sequence of functions Xp? 
in (|2. 79 [ I to X. Looking at the DTFT as a function defined on a compact (closed and bounded) 
domain such as [— 7r, 7r], the uniform convergence and the continuity of each X N implies X is 
continuous. 



a3.0 [October 2011] CC by-nc-nd Comments to book-errata@FouricrAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



208 Chapter 2. Sequences and Discrete-Time Systems 

using the Kronecker delta sequence to combine the cases. We can then rewrite 
(SJMkJ) as 



f f few-*")*™*" = !>*« 

Z7T J -* Wz J kez 



U>) 



where (a) follows from ( |2.80bj ); and (b) from the definition of the Kronecker delta 
sequence ( 12.7) , proving the inversion. 

Sequences in l 2 (Z) For sequences not in £ 1 (Z), the DTFT series ( |2.78aj) may fail 
to converge for some values of w. Nevertheless, convergence can be extended to the 
larger space of sequences £ 2 (Z) by changing the sense of convergence. 

If x G ^ 2 (Z), the partial sum Ijvfe 3 ") in ( 12.79J ) converges to a function 
X{e ju ) £ £ 2 ([-7r,7r)) in the sense that 

lim \\X{e iu ) - X N {e ju )\\ = 0. (2.81) 

N— »oo 

This convergence in C 2 ([—tt,tt)) norm 44 ! implies convergence of ( |2.78a[ ) for almost 
all values of ui, but there is no guarantee of the convergence being uniform or the 
limit function X(e JlJJ ) being continuous. 

The sense in which the inversion formula holds changes subtly as well. We 
return to this in Section 13.5.11 

Example 2.14 (Mean-square convergence of DTFT) Take the sine se- 
quence from Figure [27Ll b), 

1 . , , sin(7m/2) 

x n = ^sinc(W2) = ^^/f^ (2-82) 

It decays too slowly to be absolutely summable but fast enough to be square 
summable; that is, x G ^ 2 (Z) but x ^(Z). Thus, we cannot guarantee that 
( |2.78a) converges for every u, but the DTFT still converges in mean square. To 
see this, in Figure [278] we plot the DTFT partial sum ( 12779] ), 

N 



sin(-7m/2) 
V2 Z^ 7rn/2 






--N 

for various values of N . As the figure suggests, convergence in mean square is to 

V2, for |w| < tt/2; 



[ ' I 0, for | W |e (tt/2, tt], 

and the convergence as N — > 00 is nonuniform: it is very slow near uj = tt/2 and 
faster farther away. In fact, while there is no convergence at w = n/2, lack of 
convergence at this isolated point does not prevent convergence in mean square. 



This is also called convergence in the mean-square sense. 
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\X W {en\ 




\X lw (e?")\ 



VT 



(a) TV = 10 

Xiooo(e^)| 



(b) N = 100 



VT- 


|Xiooo(e JW )| 


- u; 


^a*n«^\^vvs^^^^vv\/\AA/V\/V\/\/\A/\|1 




20 



(c) JV = 1000 



(d) Detail of (c) 



Figure 2.8: Truncated DTFT of the sine sequence, illustrating the Gibbs phenomenon. 
Shown are |.Xjv(e JU, )| from Q2.83J with different N. Observe how oscillations narrow from 
(a) to (c), but their amplitude remains constant (the topmost grid line in every plot), 
1.089 \/2. 



The partial sum X(e 3u} ) oscillates near the points of discontinuity, with 
oscillations becoming narrower as N increases but not decreasing in size. This 
over- and undershoot is of the order of 9%, and is called the Gibbs phenomenon 
(see also Figure \0M ffl 

Using the DTFT Without Convergence The DTFT is still a useful tool even in 
some cases where it converges neither pointwise over u> nor in mean square. These 
are cases where an expression for the DTFT involving a Dirac delta function makes 
sense because evaluating the inverse DTFT gives the desired result. As with other 
uses of the Dirac delta function, we must be cautious. 

Consider the constant sequence x n = 1 for all b£Z. This sequence belongs 
to neither £ l (Z) nor £ 2 (Z), so neither of our previous discussions of convergence 
apply. In fact, there is no value of to for which the DTFT series ( 12.78a) converges. 
However, the lack of convergence is not the same for all values of u>. When lu is 
an integer multiple of 27r, (2. 78a) diverges to oo because every term in the sum 
is 1. For other values of u, it is tempting (but not mathematically correct; see 
Appendix 1.A.2) to assign the value of zero to the sum because the terms in ( 12. 78a) 
all lie on the unit circle, with no direction preferred. This gives some intuition for 



45 For any piecewise continuously differentiable function with a discontinuity of height a, the 
overshoot is 0.089a, roughly 9% higher than the original height. 
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considering the DTFT to be 

00, for uj = 0; 



1 0, otherwise 

on [— 7T, 7r]. This mathematically nonsensical statement can be replaced by the 
dangerous but useful statement 

X(e 3UJ ) = 2ttS{uj) for u G [-7T, tt] . 

Substituting this in the inverse DTFT ( ]2.78b[ ) recovers the sequence x„ = 1 for all 
n£Z that we started with. A very similar argument supports assigning the DTFT 
of 2ir 8(u) — ujq) to the complex exponential sequence e JLU ° n for any luq G (— it, tt). 

2.4.3 Properties of the DTFT 

We list here the basic properties of the DTFT; Table [2~4] summarizes these, together 
with symmetries as well as a few standard transform pairs. Of course, all the 
expressions must be well defined for these properties to hold. 

Linearity The DTFT operator F is a linear operator, or, 

ax n + /3y n D ^> T al(e jw ) + f3Y(e j "). (2.84) 

Shift in Time The DTFT pair corresponding to a shift in time by no is 

x n . no D ^? i-i*™* X(j u ). (2.85) 

Shift in Frequency The DTFT pair corresponding to a shift in frequency by luq is 

e ^on Xn DTFT X ^(.u-uo)y ( 2 .86) 

The shift in time and shift in frequency are the first of several Fourier transform 
properties that are duals in that swapping the roles of time and frequency results 
in a pair of similar statements] 46 ! Shifting in time is equivalent to modulation in 
frequency, and shifting in frequency is equivalent to modulation in time. 

Scaling in Time Scaling in time appears in two flavors: 
(i) The DTFT pair corresponding to scaling in time by N is 

AT-l 



,x„ — 1 J2 X{e^-^' N ). (2.87) 



DTFT 

* * N 



fe=0 



This type of scaling is referred to as downs ampling; we will discuss it in more 
detail in Section 2.71 



46 In this section, time is discrete and frequency is continuous; dualities are more transparent 
when both are discrete (see Section |2.6[ ) or both are continuous (see Section 1 3. 4[ ). 
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(ii) The DTFT pair corresponding to scaling in time by \/N is 



x n /N, for n = IN, £ e 
0, otherwise, 



DTFT 



X{e jNu ). (2. 



This type of scaling is referred to as upsampling; we will discuss it in more 
detail in Section 2.71 

Time Reversal The DTFT pair corresponding to a time reversal x- n is 

x _ n D ^U X(e- ju ). (2.89) 

For a real x n , the DTFT of the time-reversed version x- n is X*{e 0U} ). 

Differentiation The DTFT pair corresponding to differentiation in frequency is 

DTFT d k X(e^) 



(-jn) " x r , 



duj k 



Moments Computing the nth moment using the DTFT results in 



D 






\Y,n k x n e-^ n \ 

VreeZ / 



i-sr 9 *^ 



Qui 



w=0 



as a direct application of ( 12.901 ). The first two moments are: 



m = ^\ 
nez 



'"1 






-join 



X(0), 



I L Juvy O 



-jun 



nel 



dX{e 



ju\ 



du> 



(2.90) 

fceN, 

(2.91a) 

(2.91b) 
(2.91c) 



w=0 



Convolution in Time The DTFT pair corresponding to convolution in time is 

(2.92) 



( h * x ) n D JI? H(e> u )X(e> u ). 



First, a direct algebraic proof: The spectrum Y(e JU> ) of the output sequence y = h*X 
can be written as 



Y(en S £ 



( a ) V^ .. .-jam W 



y n e 



Y, [J2 Xkhn - k ) 

nez Vfeez / 



-jam 



J2J2*ke- juk h Tl 



-fee 



nezfeez 



(<0 



J2x k e- j " k J2hn- k e 

fcez nSZ 



-ju(n—k) 



-Mn-k) (J? X (e^)H{e : > UJ ), (2.93) 
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DTFT properties 



Time domain 



DTFT domain 



Basic properties 

Linearity 
Shift in time 
Shift in frequency 
Scaling in time 

Downsampling 

Upsampling 
Time reversal 
Differentiation in freq. 
Moments 

Convolution in time 
Convolution in frequency 
Deterministic autocorrelation 

Deterministic crosscorrelation 
Parseval's equality 

Symmetries 

Conjugate 

Conjugate, time reversed 

Real part 

Imaginary part 

Conjugate-symmetric part 

Conjugate-antisymmetric part 

Symmetries for real x 

X conjugate symmetric 
Real part of X even 
Imaginary part of X odd 
Magnitude of X even 
Phase of X odd 

Common transform pairs 

Kronecker delta sequence 

Shift by k 

Constant 

Exponential sequence 

Differentiation 

Ideal lowpass filter 
Box sequence 



ax n + f3y n 



-^n — no 



x N-n 



i/N 



n = £N: 0, otherwise 



• Nfc 9X(e'") | 



(-jn) k x n 

m k = J2 nk x « = (-it 

neZ 
(h * x) n 
iln X n 

a n = 2_^ x k x k _ n 

fcez 

C„ = J2 x k y*k- n 

feez 



&(s„) 

3(x„) 
(x n +x*_ n )/2 

(x„ - x*_„)/2j 



aX{e ]u 


')+/3Y(e : > t 

X{en 


") 




X(e>^ 


-co)) 






jV-1 

X(e jm 


X(e j( "- 2n 


*)/") 


X(e- ju 

d k X(e j 
8u> k 


') 







$n-k 
1 

a n u n 
na n u„ 



H{e jw ) X{e jw ) 

±-(H®X)(e^) 
A(e^) = \X(e^)\ 2 

C{e j ") =X(e j ")Y*(e ju ') 

X*{e~n 

(X(e^)+X*(e-^))/2 

(X(e^)-X*(e-^))/2j 
SR(X(e^)) 

X{e^) = X*{e- luJ ) 
dt(X(e j ")) = Jt(X(e-^)) 
S(X(e^)) = -S(X(e-^)) 
\X{en\ = \X{e~n\ 
argX(e^) = -argX(e"^) 

1 

e J 
2-k8(u)) 

1/(1 -«e _)u ) \a\ < 1 

ae~i^ /{I — ae~i^) 2 \a\ < 1 



|p- sinc(oJoH/2) 

1/v^rJ, for \n\ < (no - l)/2; 
0, otherwise, 



^/2-k /uiq, for |tj| < uJo/2; 
0, otherwise, 

sine (umo/2) 
sine (u>/2) 



Table 2.4: Properties of the DTFT. 
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where (a) follows from the definition of the DTFT; (b) from the definition of convo- 
lution; (c) from interchanging the order of summation, an allowed operation since 
absolute summability is implied by h * x being well defined; and (d) from the defi- 
nition of the DTFT. 

This key result is a direct consequence of the eigensequence property of com- 
plex exponential sequences v from ( )2.T5[ ) : when x is written as a combination of 
spectral components, each spectral component is simply scaled by the correspond- 
ing eigenvalue of the convolution operator; thus, using the DTFT has diagonalized 
the convolution operator. 

Convolution in Frequency The DTFT pair corresponding to convolution in fre- 
quency is 

Kx n °^ T 1 L(H®X)(en, (2.94) 

Zir 

where we have introduced the circular convolution between 27r-periodic functions 
(H®X){e JbJ ) = I X(e j6 )H(e j(uj - 9) )d9. (2.95) 

J — IT 

Convolution in frequency is often referred to as modulation in time, and it is dual 
to convolution in time (see Exercise 12. 6) . 

Deterministic Autocorrelation The DTFT pair corresponding to the deterministic 
autocorrelation of a sequence x is 

o„ = 5>fc4-n °^ T M^) = \X(en\ 2 (2-96) 

fcez 

and satisfies 

A(e ju ) = A*(e ju ), (2.97a) 

A(e ju ) > 0. (2.97b) 

Thus, A(e 3u> ) is not only real, ( 12.97a) , but positive semidefinite as well, ( ]2.97b| ). To 
verify ( 12.96) , express the deterministic autocorrelation as a convolution of x and 
its time-reversed version as in ( ]2.61d) , x n * x*_ n . We know from Table [2741 that the 
DTFT of x*_ n is X*(e JW ). Then, using the convolution property ( J2.92) , we obtain 
(236). 

For a real x, 

A(e ju ) = \X(e^)\ 2 = A(e- ju ), (2.97c) 

since X(e~ JUJ ) = X*(e~ JUJ ). The quantity A(e 3U ) is often called energy spectral 
density {the deterministic counterpart of the power spectral density for WSS se- 
quences^ in ( 12.232) ). The energy is the integral of the energy spectral density over 
the frequency range, 

E = ±J' A(ef u )du> = ^f \X(en\ 2 dcj = £ \x n \ 2 = a . (2.98) 



47 WSS stands for wide-sense stationary, defined in (|2.224j) 
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Thus, the energy spectral density measures the distribution of energy over the 
frequency range. Mimicking the relationship between the energy spectral density 
for deterministic sequences and the power spectral density for WSS sequences, (2.98) 
is the deterministic counterpart of the power for WSS sequences (2.233) . 

Deterministic Crosscorrelation The DTFT pair corresponding to the determinis- 
tic crosscorrelation of sequences x and y is 



J2 Xk y *k-n 
fcez 



DTFT 



C x , y (e jw ) = X(e ju )Y*(e ju ), (2.99) 



and satisfies 



For x, y real, 



c*,v& u ) = c: x (^)- 



C x , y (en = X(e> u )Y(e-* u ) = C„, B ( e -^). 



(2.100a) 



(2.100b) 



Further properties of deterministic autocorrelation and crosscorrelation sequences 
and their transforms are explored in Exercise 2.51 



Deterministic Autocorrelation of Vector Sequences The DTFT pair correspond- 
ing to the deterministic autocorrelation of a vector sequence x is 



A n D ^? A(en 



A) (0 C ,i(e^) 

Ci.o(e^) Ai(e* u ) 



Co,N- 

Ci,jv- 


-i(en 
-i(en 


A N - 


i(en_ 



CW-i,o(e» w ) CW-i,i(e» w ) • 

(2.101) 

where A n is given in (2.22) . Because of ( 12. 97a) and ( 12. 100a) , this energy spectral 
density matrix is Hermitian, that is, 



A(e> u ) 



For a real x, 



A (en Co.i(e^) 



C ,N-i(e ju )' 
Ci^r_i(^ w ) 



A{e luJ ) = A T {e-^)- 



A*(e ju ). 



(2.102a) 
(2.102b) 



Parseval's Equality As noted earlier, from the form of ( 12. 78a) , the DTFT is a 
linear operator from the space of sequences to the space of 27r-periodic functions. 
Let us denote this through X = Fx. We have F : £ 2 (Z) — > C 2 ([— 7r, 7r)) because 
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x 6 l 2 (Ti) implies that X(e 3U ) has finite C 2 ([— ir, ir)) norm. Specifically, 
||X|| 2 ( = ] f \X{e^)\ 2 du = [ X(e joJ )X*(e j ")du 



■'-^ Vnez / Vfeez / 

EE f *»*fce Mn - fc) dw = EE^4 f e Ju{n - 



= E E XnX k 27F Sn ~ k = 2lT E XnX ™ 

nGZfeGZ nGZ 

= 2^El s «| 2 = ^II^H 2 ' ( 2 - 103 ) 

nez 

where (a) follows from the definition of the £ 2 ([—ir, ir)) norm; (b) from the definition 
of the DTFT; (c) from an interchange that is allowed because x G £ 2 (Z) implies 
absolute convergence of the sums in the integrand; (d) from ( ]2.80bj ); (e) from the 
definition of the Kronecker delta sequence, (11,9) ; and (f) from the definition of the 
£ 2 {Z) norm. 

If it were not for the 2-7T factor, the equality ( |2.103| ) would be like the equality 
( ] 1 . 5 1 [ ) for a unitary operator; ( 12.103) is the version of Parseval's equality for the 
DTFT. Parseval's equality 48 ! is often termed the energy conservation property, as 
the energy ( 12.98) is the integral of the energy spectral density over the frequency 
range. 

A computation similar to ( J2.103) shows that F/\>2ir is a unitary operator (see 
H3Q)): 

Fx, Fy ) = (x, y) for every x and y in l 2 (Z), 



'2tt V2tt 

or, equivalently, 

( x , y) = — (X, Y) for every x and y in £ 2 (Z), (2.104) 

27T 

where X and Y are the DTFTs of x and y. This is the version of the generalized 
Parseval's equality for the DTFT, and its proof is left as Exercise 12.51 

Adjoint The adjoint of the DTFT, F* : C 2 ([-tt,tt)) -> £ 2 (Z), is determined 
uniquely by 

(Fx, y) = (x, F*y) for every x € f 2 (Z) and y in C 2 ([— ir,ir)). 



48 Recall that what we call Parseval's equality in this book is sometimes called Plancherel's 
equality; what we call generalized Parseval's equality is sometimes called Parseval's theorem. 
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Since we have already concluded that F/y/2n is a unitary operator, by Theo- 
rem 1.231 



IV /l 



F] = ^=F = V2irF 



p-1 



'2tt J \V2ir 

Thus, F* = 2TTF- 1 , with F~ x given by ( ]2.78bl >. 

2.4.4 Frequency Response of Filters 

The DTFT of a sequence is called its spectrum. The DTFT of a filter (impulse 
response of an LSI system) h is also called the frequency response: 

H(e ju ) = ^/i„e-^ n , luzR. (2.105a) 

nGZ 

The inverse DTFT of the frequency response recovers the impulse response: 

h„ = — [ H{e jul )e 3ujn duj, neZ. (2.105b) 

To understand the frequency response of a filter, we often write the magnitude 
and phase separately: 

H{e? u ) = |^(e^)|e jarg(H(e ^ )) , 

where the magnitude response -H^e- 7 ") is a 27r-periodic real, nonnegative function, 
and the phase response arg(H(e :IUJ )) is a 27r-periodic real function between — ir and 
7rr 9 l A filter is said to have zero phase when its frequency response is real; this is 
equivalent to the phase response taking only values that are integer multiples of 7T. 
A filter is said to have generalized linear phase when its frequency response can be 
written in the form 

H(e jw ) = r(u)e Jiau+0) , (2.106) 

where r(uj) is real and a and f3 are constants; this corresponds to a phase response 
that is affine in uj (straight lines with slope a) except where there are jumps by 2ir. 
When furthermore j3 = 0, the filter is said to have linear phase. A filter is called 
bandlimited when its frequency response is finitely supported. Solved Exercise 12.3 
explores filters as projections through their frequency response. 

Ideal Filters The frequency response of a filter is typically used to design a filter 
with specific properties, where we want to let certain frequencies pass — the passband, 
while blocking others — the stopband. An ideal filter is a filter whose magnitude 
response takes a single nonzero value in its passband. For example, an ideal lowpass 
filter passes frequencies below some cut-off frequency Wrj/2 and blocks the others; 
its passband is thus the interval [— Wrj/2, u>o/2]. All ideal filters are bandlimited. 
Figure 12.9( b) gives an example for ujq = it. 



49 The argument of the complex number H(e?") can be equally-well defined to be on [0, 2ir). 
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Ideal lowpass filter 




-. f 

(a) Impulse response 



(b) Magnitude response. 
Ideal highpass filter 



1 


h n 


vT 




, 1 


1 ... 1 ... rj 


-10' 1 






1 '10 







V2" 


\H(en\ 










K K 3.T 

4 2 T * 



(c) Impulse response. (d) Magnitude response. 

Ideal bandpass filter 



1 


h n 


VT 




, 1 




-lo 






ib 







V5" 


\H{en\ 














k n 3 
4 2 


f 



(e) Impulse response. (f) Magnitude response. 

Figure 2.9: Impulse and magnitude responses of ideal niters. 



To find the impulse response of such an ideal filter, we start with the desired 
magnitude response: 



H( e i") = / n/W^O. for M < ^o/2; 
(_ 0, otherwise. 



(2.107a) 



This is a zero-phase filter and a box function in frequency] I Applying the inverse 



3 Table \3S\ in Chapter \3\ summarizes box and sine functions in time and frequency. 
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DTFT, we obtain the impulse response as 



\/2ttlJo 



u /2 



e Jujn duo 



o Q /2 



■ sinc(won/2) 



(2.107b) 



by elementary integrations, with the n = and n/0 cases separated. This impulse 
response is of unit norm. A case of interest is the TVth band filter loq = 2tt/N, and 
in particular, a halfband filter when ujq = n, which passes through half of the 
spectrum, from — 7r/2 to 7r/2, with magnitude response as in Figure 12.9( b), and 
impulse response 



K, 



\= sinc(7rn/2) 



vl 



x sin(7rn/2) 
73 irn/2 



(2.108) 



as in Figure 12.9( a) . These ideal filters are summarized in Table 12.51 Their impulse 
responses decay slowly as 0(l/n) and are thus not absolutely summable. This lack 
of absolute summability of the impulse response h is unavoidable when the desired 
frequency response H is discontinuous; see Section [2.4.21 



Ideal filters 



Time domain DTFT domain 



- sinc(o;oi/2) 



Ideal lowpass filter 

Ideal Nth-band filter (1/VN) sinc(wn/N) 

Ideal halfband lowpass filter (1/V2) sinc(7rn/2) 



^/2tt/uj , 

o, 


\u\ < u /2; 
otherwise. 


Vn, 
o, 


\w\ < n/N; 
otherwise. 


y/2, 
o, 


\u>\ < it/2; 
otherwise. 



Table 2.5: Ideal filters with unit-norm impulse responses. 



FIR Filters Ideal filters are not realizable; thus, we now explore a few examples 
of filters with realizable frequency responses. We start with an FIR filter we have 
already seen in Example 2.21 

Example 2.15 (Moving average filter, Example 12.21 cont'd) The impulse 
response of the moving average filter in (2.5) is (we assumed N odd) : 



K 



l/N, for \n\ < (JV-l)/2; 
0, otherwise, 



(2.109a) 



which is the same, within scaling, as the box sequence from (2. 13a) . Its frequency 
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h n 








7 


























-3-1 1 3 




(a) Impulse response. (b) Magnitude response. 

Figure 2.10: Moving average filter ( [275) with N — 7. 



IS 




H{e? u ) = 


(N-l)/2 N -! 

N 2^ ~ N e 2^ e 

n=-(JV-l)/2 fe=0 


W 


1 c io;(JV-l)/2 J- e - 1 - e e 




TV 1 — e - ^ iV e^/ 2 — e - -'"/ 2 




1 sin(w7V/2) 




TV sin(w/2) ' 



(2.109b) 



where (a) follows from change of variable k = n + (N — l)/2; and (b) from 
QP1.65-1) , the formula for a finite geometric series. Figure [2.101 shows the impulse 
response and magnitude response of this filter for N = 7. 



Linear- Phase Filters Real- valued FIR filters have linear phase when they are sym- 
metric or antisymmetric. Consider causal filters with length L, so the support is 
{0, 1, . . . , L — 1}. These filters then satisfy 



symmetric 



antisymmetric 



K, 



h 



L-l- 



h„ 



-h 



L-l- 



(2.110) 



These symmetries are illustrated in Figure 2.11 for L even and odd. Let us now 
show that an even-length, symmetric filter as in part (a) of the figure indeed leads 
to linear phase; other cases follow similarly. We compute the frequency response of 
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(a) Symmetric, even length. (b) Symmetric, odd length. 
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(c) Antisymmetric, even length. (d) Antisymmetric, odd length. 



Figure 2.11: Filters with symmetries. 



L-l 



(a) 



L/2-1 



,< = J2 h n (e-^" + e 



jujn i —jui(L-l-n) 



71=0 



Tl = 

L/2-1 

V /i n e _Jw(L ~ 1)/2 (V w( "~ (L ~ 1)/2) + e ~ iw(n ~ (L " 1)/2) ) 



((') 



n=0 
L/2-1 
2 \ h n cos | „; | /; 

n=0 
r(w)e jaw , 



L- 1 



-ju,((L-l)/2) 



with 



L/2-1 



r(ui) = 2 ^] h n i 



cos \ uj \ n 



L- 1 



and a 



L- 1 



(2.111a) 



(2.111b) 



This frequency response fits the form of ( 12.106) , so the filter indeed has linear 
phase. In the above, (a) follows from gathering factors with the same h n because 
of symmetry in ( 12.110) ; and (b) from using ( 12.275) . 

Allpass Filters Another important class is filters with unit magnitude response, 
that is, 

\H{e juJ )\ = 1. (2.112) 
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Since all frequencies go through without change of magnitude, a filter satisfying 
( 12.112) is called an allpass filter. Allpass filters have some interesting properties: 

(i) Energy conservation: The allpass property corresponds to energy conserva- 
tion, since, using Parseval's equality ( ]2.103[ ), we have 

llvll 2 = ^-\\Y(en\\ 2 = ^\\H(enX(en\\ 2 
Ait Zir 

= ±- f \H(enX(en\ 2 dw = ±- f \X(e**)\ 2 du = \\x\\ 2 . 
2tt J_ n 2tt J_„ 

(ii) Orthonormal set: The allpass property implies that all the shifts of h, {<fk,n = 
h n -k}kez, form an orthonormal set: 

(h n , h n - k ) n = Y,KK-k = ^- [ V H ^) (?-*** Hie*"))* dw 
Z?z 27r J - 

= —[ e> uk H(e^ u )H m (e' u )du = — I e juk \H(e ju )\ 2 dio 



— TV 



1 



= ±/V fc du, ® ^ ® S k , (2.113) 

27T J_ 7I kit 

where (a) follows from the generalized Parseval's equality ( 12.104) and the 
shift in time property ( 12.85) ; (b) from our assumption; (c) from ( j2.107b) with 
ujq = 2tt and scaling of the filter's magnitude response; and (d) from ( |2.8cj ). 
We summarize this property as 

(h n ,h n - k ) n = S k ™ \H(e ju )\ = 1. (2.114) 

(iii) Orthonormal basis: The allpass property implies that {<pk,n = hn-k}k£Z is an 
orthonormal basis for IL 2 (%). Having already shown in ( 12.113) that the set is 
orthonormal, we check if we can write any x n as 

fcez fcez 

with fik = (x n , h n ^k)n- It is sufficient to verify that ||/3|| = \\x\\. Write 

Pfe = / J Xnhn-k = / 4 x n n_f k _ n) = x n * n h k _ n , 

nGZ nGZ 

and thus, 

||/3|| 2 ( => ±\\X(enH*(en\\ 2 = f ll*(^)l| 2 = M\\ 

Z7T Z7T 

where (a) follows from the convolution property ( 12.92) , Parseval's equality 
( 12.103) and ( 12.89) ; (b) from H(e 3u} ) having unit magnitude for all w; and (c) 
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from Parseval's equality again. Figure [2.121 shows the phase of H(e JUJ ) given 
in (pmSJ. 

This discussion contains a piece of good news — there exist shift-invariant 
orthonormal bases for £ 2 (Z) , as well as a piece of bad news — these bases have 
no frequency selectivity (they are allpass sequences) . This is one of the main 
reasons to search for more general orthonormal bases for ^ 2 (Z), as we do in 
Part II of the book. 

Example 2.16 (Allpass filters) Consider the simple shift-by-fc filter given 
in ( 12. 38a) with the impulse response h n = #«,_&. By evaluating ( 12.105a) , the 
frequency response is H(e^) = e~ Jwfc . Thus, h is an allpass filter: 

\H(e ju )\ = 1, wg{H{e 0UJ )) = -cuk mod 2tt. 

This filter has linear phase with a slope — k given by the delay. 

We now look at a more sophisticated allpass filter It provides an example 
where also see that while key properties that are not plainly visible in the time 
domain become obvious in the frequency domain. The filter is: 

g n = a n u n , g = I . . . |T| a a 2 a 3 . 

with a € C and \a\ < 1, and u n is the Heaviside sequence from ( 12.10) . Suppose 
h satisfies 

h n = -a* g n + g n -\, neZ. 

We now show h is an allpass filter, so filtering a sequence x with h will not 
change its magnitude; moreover, h is of norm 1 and orthogonal to all its shifts 
as in ( 12.114) . To start, find the frequency response of g n , 

G(e ju ) = Va"e- Jwn = — -, 

v ' ^ 1 - ae-3" 

ngZ 

where (a) follows from the formula for the infinite geometric series ( jPl.65-31 ). 
Then, 



-JUl 



a 



H(e lu ) = -a*G(e^) + e-^G(e^) = -. (2.115) 

1 — ae -> u 

The magnitude squared of H(e JU ) is 

- ,2 ■ ■ (e~ juJ - a*)(e ju - a) 

\H(e^)\ = H(e' u )H*(e' u ) - 1 



(1- ue~i u ){l -a*ei") 



and thus ^(e- 7 ") = 1 for all to. The phase response is shown in Figure [2. 121 
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atg(H(e lu )) 










-* -f 


\ f n 











Figure 2.12: Phase of a first-order allpass filter as in in ( 12.115) with a = 1/2. 



2.5 z- Transform 

While the DTFT has many nice properties, its use is limited by the convergence 
issues discussed in Section f2.4.21 The z-transform introduces a set of rescalings of se- 
quences so that almost any sequence has a rescaling such that the DTFT converges. 
This makes the z-transform much more widely applicable than the DTFT. 

Take the Heaviside sequence from ( 12.101 ), which is neither in £ (Z) nor £ 2 (Z,) 
(nor any other £ P (Z) space except t°°(Z)) t and thus has no DTFT. If we were to 
multiply it by a geometric sequence r™, with r € [0,1), yielding x n = r n u n , we 
could take the DTFT of x n , as we now have an absolutely-summable sequence. By 
controlling the rescaling with r, we have a set of DTFTs indexed by r, and we can 
think of this as a new transform with arguments r and U). Combining r and u 
through z = re 3un , we obtain a transform with argument z € C where z need not 
have unit modulus. 

Because of the close connection to the DTFT, we will have a convolution 
property as well as many other properties similar to those in Section 2.4.31 but now 
for more general sequences. Indeed, as we will see shortly, convolution of finite- 
length sequences becomes polynomial multiplication in the z-transform domain. 
This is the essential motivation behind extending the analysis that uses the unit- 
norm complex exponential sequences in ( 12.75) to more general complex exponential 
sequences v n = z n = (re 3U ) n . 



2.5.1 Definition of the z- Transform 



Eigensequences of the Convolution Operator The eigensequence property ( 12.76) 
extends from complex exponentials with unit modulus to those with any modulus. 
Consider the sequence 



(re- 7 ")", neZ, 



(2.116) 



where r G [0, oo) and u) G K, so z is any complex number. Like a complex exponen- 
tial sequence with unit modulus, this is also an eigensequence of the convolution 
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224 Chapter 2. Sequences and Discrete-Time Systems 

operator H associated with the LSI system with impulse response h since 
(Hv) n = (h*v) n = y^ v z>n -k hh = ^z n ~ k h k 

fcGZ feGZ 

= Y. h >< z ~ k zH ■ ( 2 - 117 ) 




This shows that applying the convolution operator H to the sequence v gives a 
scalar multiple of v; v is an eigensequence of H with corresponding eigenvalue A z . 
We call that eigenvalue H(z); it is defined formally in f | 2 . 1 5 [ ) . We can thus rewrite 
( 123171 ) as 

Hz n = h*z n = H{z)z n . (2.118) 

The key distinction from ( 12.76) is that the set of impulse responses h for which the 
sum ( 12.117) converges now depends on \z\. 

z-Transform The z-transform is defined similarly to the DTFT in Definition 12.111 



Definition 2.12 (z-transform) The z-transforrn of a sequence x is 

X(z) = ^x n z- n , zeC. (2.119) 

nSZ 



It exists when ( 12.119) converges absolutely for some values of z; these values of z 
are called the region of convergence (ROC), 

ROC = {z I \X(z)\ < 00}. (2.120) 

When the z-transform exists, we denote the z-transform pair as 

x n < — > X(z), 
where the ROC is part of the specification of X(z). 



Relation of the z-Transform to the DTFT Given a sequence x and its z-transform 
X (z) with an ROC that includes the unit circle \z\ = 1, the z-transform evaluated 
on the unit circle is equal to the DTFT of the same sequence: 

X(z)\ z ^ = X{en- (2.121) 

Conversely, suppose y n = r~ n x n has DTFT Y{e JU ). Then 

Y(e 3U ) = Y j r~ n x n e- Jun = ^ x n (re JW ) - " = X(re^), 
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so 

X(z)\ z=re ^ = Y(en (2.122) 

and the circle \z\ = r is in the ROC of X(z). 

2.5.2 Existence and Convergence of the z-Transform 



Convergence For the z-transform to exist and have z = re JUJ in its ROC, (2.119) 
must converge absolutely. Since 

El —n I \ ^ I —n I I —jujn I \ ^ I — n I 

\%nZ I — / , \XnT ||^ I — / j \Xn^ | ; 

nG^ n£Z n£Z 

absolute summability of x n r~ n is necessary and sufficient for the circle \z\ = r to 
be in the ROC of X(z). Thus, the ROC is a ring of the form (see also Table 12.61) 

ROC = {z I < r-i < \z\ < r 2 < 00}. (2.123) 

By convention, the ROC concept is extended to \z\ = 00 by including \z\ = 00 in 
the ROC when x n = for all n < and excluding it otherwise. Similarly, z = is 
in the ROC when x n = for all 11 > and not in the ROC otherwise. Exercise 12.91 
explores a number of properties of the ROC. 

Example 2.17 To develop intuition, we look at a few examples: 

(i) Shift-by-no sequence 

\z\ > 0, if n > 0; 
6 n - no — X(z) = : ROC = { allz, if n = 0; 

\z\ < 00, if no < 0. 

The shift-by-one maps to z , which is why z^ 1 is often called a delay 
operator. It also follows that 

Xn-no ^ z- n "X(z) ROC = ROC,, 

with the only possible changes to the ROC at or 00. 
(ii) Right-sided geometric sequence 

x n = a n u n 3. X(z) = — r ROC = {z| \z\ > \a\}. 

1 — az i 

(2.124a) 
Now, z = a is a zero of the denominator of the complex function X(z), 
and we see that the ROC is bounded from inside by a circle containing 
the z = a. This is a general property, since the ROC cannot contain a 
singularity (a z such that X(z) does not exist), 
(iii) Left-sided geometric sequence 

x n = -a n u^ n ^ ^ X(z) = l —— R0C = {z|| 2 |<H}. 

1 — az 1 

(2.124b) 
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i ■ 

4 



(a) x n = (f)" Wn 





32 n 




6 4* ' -2 


2 4 6 











(C) X„ 



-(i)"«-n-l 



raw 

(-Q) " " 

I T I 

(b) ROC = {z | |z| > |||} 

f ( | '} soo 

(d)ROC = {z| \z\ < |||} 



Figure 2.13: Illustration of Example 12. 171 (a) The right-sided geometric series sequence 
and (b) the associated ROC of its z-transform. (c) The left-sided geometric series sequence 
sequence and (d) the associated ROC of its z-transform. The unit circle is marked in both 
(b) and (d) for reference. 



The expression for X(z) is exactly as in the previous case; the only difference 
is in the ROC. Had we been given only this X(z) without the associated 
ROC, we would not be have been able to tell whether it originated from x 
in ( 12.124a) or (2.124b[ ). This shows why the ^-transform and its ROC form 
a pair that should not be broken. 

A standard way of showing the ROC is a plot of the complex plane, as in Fig- 
ure [27131 Marking the unit circle establishes the scale of the plot, and the DTFT 
converges for all u> when the unit circle is in the ROC. 



Rational z-Transforms An important class of z-transforms are those that are ra- 
tional functions, since transfer functions of most realizable systems (systems that 
can be built and used in practice) are rational. We will see in Section 12.5.41 that 
these are directly related to difference equations with a finite number of coefficients, 
as in (12.541) . Such transfer functions are of the form 



H[z) 



B(z) 
A{zY 



(2.125) 



where A(z) and B(z) are polynomials in z , of degree N and M, respectively. 
The degrees satisfy M < N, otherwise, polynomial division would lead to a sum 
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2.5. z-Transform 227 

of a polynomial and a rational function satisfying this constraint. The zeros of the 
numerator B(z) and denominator A(z) are called the zeros and poles of the rational 
transfer function H(z). Many properties of LSI systems depend on the zeros and 
poles and their multiplicities. 

Consider a finite-length sequence h = [ho hi ... Hm\ ■ Then H(z) = 
Sfe=o hkZ~ k , which has M poles at z = and M zeros at the roots {zk}%L 1 of the 
polynomial H(z)c3 Therefore, H{z) can be written as 

M 
H{z) = h \{(l-z k z~ l ), \z\ >0, (2.126) 

fe=i 

where the form of the factorization shows explicitly both the roots as well as the 
multiplicative factor ho. 

The rational z-transform in ( 12.125) can thus also be written as 

h{z) = fc °nf =1 (i-^- 1 ) > {2U7) 

where {zk}jf =1 are zeros and {pk}k=i poles. The ROC cannot contain any poles 
and is thus, assuming a right-sided sequence, all z outside of the pole largest in 
magnitude. If M is smaller than N, then H(z) has N — M additional zeros at 
0. This can be seen in our previous example ( 12. 124a) , which can be rewritten as 
1/(1 — az^ 1 ) = z/(z — a) and has thus a pole at z = a and a zero at z = 0. 

Inversion 

Given a z-transform and its ROC, how do we invert the z-transform? The general 
inversion formula for the z-transform involves contour integration, a standard topic 
of complex analysis. However, most z-transforms encountered in practice can be in- 
verted using simpler methods which we now discuss; Further Reading gives pointers 
for a more detailed treatment of the inverse z-transform. 

Inversion by Inspection This method is just a way of recognizing certain z- 
transform pairs. For example, from Table [2T6l we see that the z-transform 

1 



X(z) 



1 - (1/4)*- 1 



has the form of 1/(1 — az 1 ), with a = 1/4. From the table, we can then read the 
sequence that generated it as one of the following two: 

(i)%„, ifROC = {z||z|>i}, 

or 

-Q)"u_ n _i, ifROC={z||z|<i}. 

No other ROC is possible. 

51 The fundamental theorem of algebra (Theorem |2.19| ) states that a degree-M polynomial has 
M complex roots. 
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Inversion Using Partial Fraction Expansion When the z-transform is given as a 
rational function, partial fraction expansion results in a sum of terms, each of which 
can be inverted by inspection. Here we consider cases in which the numerator and 
denominator are polynomials in z~ l , as in ( 12. 127j ) . 

(i) M < N , simple poles: If all the N poles are of first order, we can express 
X(z) as 

N . 

*(*) = E l * -i ' Ak = ^ l -PkZ- l )X{z)\ z=pk . (2.129a) 

fc=i Pk 

Each term has a simple inverse z-transform, which depends on the ROC of 
X(z). The ROC takes one of the following forms: 

f {*IM<M}; 

ROC = { {z I \p k \ < \z\ < \p k +i\} for some fc; 
I {z\\z\>\ PN \}, 

where we have assumed \p\ \ < \p2 | < • • • < \pn | for simplicity. Each distinct 
ROC corresponds to a different sequence. Often the ROC is {z \ \z\ > \pn\}, 
resulting in 

N 
Xn = J2MPk) U Un- (2.129b) 

(ii) M < N, poles with multiplicity: Suppose X(z) has pole pi of order s > 1. 
Then, in general, the ith term in ( 12. 129a) is replaced by s terms 

C k 



^a 



,_, ■■■ PiZ l ) k ' 



The k = 1 term is inverted as before, and the terms for k > 1 are inverted 
using the differentiation rule from Table 12.61 
(iii) M > N: Assume all poles are of first order; multiplicities can be treated as 
above. We can write X{z) as 



M-N n . 



x{ Z ) = J2 B ^ k + E i_ * - 1 - ( 2 - 130a ) 



fc=o fe=i ' " 



The first summation of ( 12. 130a) is clearly the ^-transform of the sequence 



... B B\ B 2 ... Bm-n 



T 



There are many possible ROCs, each determining a distinct sequence corre- 
sponding to the second summation in ( 12.130a) . When the ROC is outside of 
the largest pole, putting together both summations of ( |2.130a) yields 

M-N N 

X n = J2 B ^n-k + Y,MPkY- (2.130b) 

fc=0 fe=l 
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We illustrate the method with an example: 

Example 2.18 (Inversion using partial fraction expansion) Given is 

1 — z^ 1 1 — 2 _1 

X(z) 



l-5z~ 1 + 6z~ 2 " (l-2z- 1 )(l -3Z- 1 )' 



with poles at z = 2 and z = 3. We compute the coefficients as in (2T29aJ: 

l-z~ 



A x 



\-Zz~ 

yielding 



-1, 



1 " 2*" 



z=3 



X(z) 



\-2z~ 1 1-3Z- 1 
The original sequence is then 

(2™ - 2 ■ 3™)w_„_i, if ROC = {z \ \z\ < 2}; 
-2 n u„ - 2 • 3™w_„_i, if ROC = {z \ 2 < \z\ < 3}; 
(-2™ + 2-3™)m„, if ROC = {0| |z| > 3}. 

Inversion Using Power-Series Expansion This method is most useful for finite- 
length sequences. For example, given X(z) = (1 — z~ l )(l — 2z~ l ), we can expand 
it in its power-series form as 

X{z) = 1 - 32T 1 + 2z~ 2 . 

Knowing that each of the elements in this power series corresponds to a delayed 
Kronecker delta sequence, we can read off the sequence directly: 

x n = 5 n - 3<5„_i + 2<5„_ 2 - 

Example 2.19 (Inversion using power-series expansion) Suppose 

X(z) = log(l + 2z _1 ), with ROC = {z \ \z\ > 2}. (2.131) 



To invert this z-transform, we use its power-series expansion from Table IP1.65-1 J 
Substituting x = Zz^ 1 , we confirm that |x| = |2z _1 | < 2 ■ , = 1, and thus the 

series expansion 

00 

log(l + 2z- 1 ) = V(-l)"+ 1 _z^ 
z — ' n 

n=l 

holds for the z values of interest. Thus, the desired inverse z-transform is 

(-l) n+1 2 n n- 1 , forn>l; 
0, otherwise. 
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2.5.3 Properties of the ^-Transform 

The z-transform has the same properties as the DTFT, but for a larger class of 
sequences. The main new twist is to properly account for ROCs. As an example, the 
convolution of two sequences can be computed as a product in the transform domain 
even when the sequences do not have proper DTFTs, provided that the sequences 
have some part of their ROCs in common. A summary of z-transform properties 
can be found in Table 2.61 As convolution in frequency as well as Parseval's equality 
involve contour integration, we opt not to state them here; a number of standard 
texts cover those. 

Linearity The z-transform is a linear operator, or, 

ax n + (3y n 4±* aX(z)+/3Y(z), ROC ax+/38/ D ROC K n ROC y . (2.132) 

Shift in Time The z-transform pair corresponding to a shift in time by uq is 

x n -n ^ z- n °X(z), (2.133) 

with no changes to the ROC except possibly at z = or \z\ = 00. 

Scaling in Time Scaling in time appears in two flavors: 

(i) The z-transform pair corresponding to scaling in time by N is 

AT-l 



*"» ^ ^E X (^ 1/JV )> (ROCy 1 ^. (2.134) 

fc=0 

We have already seen this operation of downsampling in ( 12.87) and will discuss 
it in more detail in Section 12.71 
(ii) The z-transform pair corresponding to scaling in time by 1/N is 

c n / N , for n = IN, £ e Z; 7^ x N {ROC x f . (2.135) 

0, otherwise, ' 

We have already seen this operation of upsampling in (2.88) and will discuss 
it in more detail in Section 12.71 

Scaling in z The z-transform pair corresponding to scaling in z by a -1 is 

(X x n 



^ X{a- l z), |a|ROC x . (2.136) 



Time Reversal The z-transform pair corresponding to a time reversal X- n is 

1 
ROC 



^ ^(^ 1 )> ^7^- ( 2 - 137 ) 
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Differentiation The z-transform pair corresponding to differentiation in z is 

ROC x . (2.138) 



n k x r , 



,zt, ,_ X) k z k d k X{z) 



dz k 



Moments Computing the nth moment using the z-transform results in 

Nfc d k X(z) 



m k 



neZ VneZ 



/ V *tL* fi & 



(-1)* 



dz k 



as a direct application of Q2.138J ). The first two moments are: 



m 



mi 



/ j X n — i / . X n Z I 

nez \nei ) 

^nin = I ^ nx n z~ 
nez \nei 



X(0), 



dX{z) 



0z 



z=l 



keN, 


(2.139a) 


(2.139b) 


(2.139c) 



Convolution in Time The z-transform pair corresponding to convolution in time 



(h* x) r 



ZT 



H(z) X(z), HOCw D ROC h n ROC x . 



(2.140) 



This key result is the z-transform analogue of DTFT property ( 12.92) . The z- 
transform of y = h * x can be obtained with slight modifications of (2.93) : 



Y(z) ( = ] 5>*"" ® 



^ ^x k h n ^ k J z ™ 



nez nez \fee 

^^x fe z-H„_ fe z-("- fe ) 
nezfeez 

feez «ez 



M V^„ „-fcV^ 



a; fe ^ fe Y, /i„_ fc z-( n - fe ) ( = ] X(z)ff(z), 






(2.141) 



where (a) follows from the definition of the z-transform; (b) from the definition 
of convolution; (c) from interchanging the order of summation; and (d) from the 
definition of the z-transform. The distinction from ( 12.93) is that (c) may hold when 
DTFTs of x and h do not exist; when z € ROC;, fl ROCx, each series following 
(c) is absolutely convergent, enabling the interchange. The wider applicability of 
(2.140) than (2.92) is a key feature of the z-transform. 

Example 2.20 (z-transform convolution property) For some a € K + , 
consider 

h„ 



^71) 



a u r , 



We cannot use the DTFT to compute the convolution y = h * x because x does 
not have a DTFT, and for a > 1, neither does h. The z-transform exists for 
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z-transform properties 


Time domain 


z domain 


ROC 


ROC properties 


General seq. 




< n < \z\ < r-2 < oo 




Finite-length seq. 




all z, except possibly 0, oo 




Right-sided seq. 




\z\ > largest pole 




Left-sided seq. 




\z\ < smallest pole 




BIBO stable 




3 1*1 = 1 


Basic properties 








Linearity 


ax n + f}y n 


aX{z) + /3Y(z) 


D ROC^ n ROC,, 


Shift in time 


x n-n 


z~ n ° X(z) 


ROC* 


Scaling in time 




N-l 




Downsampling 


x Nn 


(1/JV) J2 XiW^z 1 ^) 


(ROC,) 1/JV 


Upsampling 


X n/N) n = PN; 0, otherwise 


fc = 

X(z N ) 


(ROC*)^ 


Scaling in z 


a n x n 


X{a~ 1 z) 


lalROO; 


Time reversal 


X- n 


Xiz- 1 ) 


l/ROC* 


Differentiation 


n k x„ 


<-l) k z k d k X(z)/dz k 


ROC* 



Convolution in time 
Deterministic autocorrelation 

Deterministic crosscorrelation 

Symmetries 

Conjugate 

Conjugate, time reversed 

Real part 

Imaginary part 

X conjugate symmetric 

Common transform pairs 

Kronecker delta sequence 
Shift by k 
Exponential sequence 

Differentiation 



m k = ^ n k x n = (-l) k d k X(z)/dz k 

iiGZ 
(h * x) n 

an = 2_^ x k x k-n 
k€l 



H(z)X(z) 

A(z) = X(z)X t (z- 1 ) 



c„ = J2 Xk yl- 

feez 
,x\* 

x — n 
M(x n ) 

x n real 

S„ 

&n-k 
«"«n 

-a™M_„_i 

na n u n 

-n,a n u- n -i 



C(z) = X(z)Y t (z- 1 ) 



D ROC h n ROC* 
ROC* n 1/ROC* 

1/ROC* n ROC B 



X*(z*) 


ROC,; 


X^z- 1 ) 


l/ROC* 


(X(z) + X*(z*))/2 


ROC* 


(X(z)-X*(z*))/2j 


ROC* 


X{z) = X"(z*) 


ROC* 


1 


all z 


z~ k 


all z, except possibly 0, oo 


1/(1 -az' 1 ) 


M>M 




1*1 <M 


{az- l )/{l-az- l f 


\z\>\a\ 




\z\<\a\ 



Table 2.6: Properties of the z-transform. 



both x and h, and it can be used to compute the convolution provided that the 
ROCs overlap. The ^-transforms are 



X(z) 
H(z) - 



1 



l-z- 1 
1 

1 — az' 



ROC, = {z | \z\ > 1}, 
-. ROC h = {z\\z\>a}, 
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and thus 

Y{z) = — — — 3-— — — , ROC y Z){z\ |z|>max{o,l}}. 

By partial fraction expansion, we can rewrite Y(z) as 

-q/(l-q) 1/(1 -a) 

[ ' (1-az- 1 ) + (I-2- 1 )' 

leading to 

a „ 1 l-a n+1 

Vn = ~- a u n + - u n = — u n . 

1 — a 1 — a 1 — a 

As a check, we can compute the time-domain convolution directly: 

fcez fc=o fc=o 

When q G [0,1), the DTFT of y exists, but we nevertheless needed the z- 
transform to compute the convolution because the DTFT of x does not exist. 
When a > 1, the DTFT of y does not exist, while the z-transform Y(z) exists 
with ROC {z I \z\ > a}. 

Example 2.21 (Failure of z-transform convolution property) Here is 
an example where the convolution sum converges, but even the z-transform does 
not help in computing it: 

x n = 1, n€ Z, h n = a n u n , < a < 1. 

We can compute the convolution directly 

y n = h*X = }h n Xk-n = y^a n = . 

nSZ riGN 

However, there are no values of z such that the 2-transform of a; converges, that is, 
the ROC is empty. This prohibits the use of the z-transform for the computation 
of this convolution. 



For finite-length, right-sided sequences, ( 12.140) connects convolution with polyno- 
mial multiplication. Given a length- N sequence x and a length- M impulse response 
h, the z-transforms of x and h are 



Af-l N-l 

x n z 

n=0 n=0 



H(z) = J2 ^~™> x ( z ) = E 



Each is a polynomial in z" 1 . The product polynomial H(z)X(z) has powers of z _1 
going from to M + N — 2, and its nth coefficient is obtained from the coefficients 
in H(z) and X(z) that have powers summing to n, that is, the convolution h * x 
given in ( 12.59) . 
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Deterministic Autocorrelation The ^-transform pair corresponding to the deter- 
ministic autocorrelation of a sequence x is 

a n = 5> fc <_„ ^ A(z) = X(z)X4z- 1 ), ROCn-i— , (2.142) 

feez x 

where X*(z) denotes X*(z*), which amounts to conjugating coefficients but not z. 
This ^-transform satisfies 

A(z) = A»(z -1 ). (2.143a) 

For a real x, 

A{z) = X{z)X{z- 1 ) = Aiz- 1 ). (2.143b) 



The proof of ( |2.143bj ) is left for Exercise 12.131 We know that on the unit circle, the 
deterministic autocorrelation is the square magnitude of the spectrum | J s T(e JClJ ) | 2 as 
in ( 12.961 ) . This quadratic form, when extended to the z-plane, leads to a partic- 
ular symmetry of poles and zeros when X(z), and thus A(z) as well, are rational 
functions. 



Theorem 2.13 (Rational autocorrelation) A rational function A(z) is the 


z-transform of the deterministic autocorrelation of a stable real 


sequence x, if and 


only if: 




(i) its complex poles and zeros appear in quadruples: 




i^z*,;?- 1 ,^- 1 )*}, {pt.Pf.pr 1 . (pr 1 )*}; 


(2.144a) 


(ii) its real poles and zeros appear in pairs: 




{z^z' 1 }, {p^p- 1 }; 


(2.144b) 


and 




(iii) its zeros on the unit circle are double zeros: 




{z h zt,zr\{z^y} = {e^*,e-^,e-^,e^}. 


(2.144c) 


with possibly double zeros at z = ±1. There are no poles 


on the unit circle. 



Proof. The proof follows from the following two facts: 

1. a n is real, since x n is real. From Table [2Jjj this means that: 

A*(z) = A{z*) =» PlP ° le =* P t P ° le (2.145a) 

z, zero =>• Zi zero 

2. a n is symmetric, since a_ n = «»• From Table [276] , this means that: 

Aiz- 1 ) = A(z) => ^ P ° le => P S P ° le (2.145b) 

Zi zero => z- zero 
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We now proceed to prove that A(z) being the z-transform of the autocorrelation of a 
stable and real sequence x implies (i)|(iii)[ The converse follows similarly. 

(i) From (j2.145aj) -( [2.145bj) , we have that 



Pi pole 



p* pole 
pT 1 P ole 



(p*) Vole, 



similarly for zeros, and we obtain the pole/zero quadruples in ( |2.144a[) . 
(ii) If a zero/pole is real, it is its own conjugate, and thus, quadruples in ( |2.144a[) 

become pairs in fl2.144b[ ), 
(iii) Since x is stable, there are no poles on the unit circle. Since x is real, X*(z) — 

X(z*). Thus, a rational A(z) has only zeros on the unit circle from X(z) and 

Xiz- 1 ). 



Zi zero ofX(z) 



z f 1 zero of X(z J ) 
2* zero of X(z) 

(z*)~ zero ofX(z~ ) => Zi — {z*)~ zero oiX(z). 



Thus, both X(z) and X(z 1 ) have Zi as a zero, leading to double zeros on the 
unit circle. 



Deterministic Crosscorrelation The z-transform pair corresponding to the deter- 
ministic crosscorrelation of sequences x and y is 



c„ = ^2x k y* k _ 

kez 

and satisfies 
For x, y real, 



ZT 



C Xty (z) = X(z)Y4z- 1 ), 



1 



ROC 



C x<y {z) = X{z)Y(z- 1 ) = C ytX {z- x ). 



-nROCy, (2.146) 

(2.147a) 
(2.147b) 



Deterministic Autocorrelation of Vector Sequences The z-transform pair corre- 
sponding to the deterministic autocorrelation of a vector sequence x is 



ZT 



A(z) 



A (z) C ,i(z) 

Ci, (z) Ariz) 

C N -\fi{z) C N -x,i{z) 



C ,Ar_i(z) 
Ci )A r_i(z) 

A N -x{z) 



(2.148) 



where A n is given in ( 12.22) . Because of ( 12. 20a) , A(z) satisfies 



A(z) 



A (z) 

CoM^ 1 ) 



CoAz) 
Ax(z) 



pO,N-l*\Z ) Cl,N-l*(Z ) 



Co : N- 
Cx,N- 


-iW 


A N - 


l(*). 



A^z' 1 ). (2.149a) 
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Here, A*(z) = A*(z*) extends the previous notation to mean transposition of A 
and conjugation of coefficients, but not of zc 2 } For a real x, 

A(z) = A T {z- 1 ). (2.149b) 

Spectral Factorization The particular pattern of poles and zeros which charac- 
terizes a rational autocorrelation in Theorem 12.131 leads to a key procedure called 
spectral factorization. This amounts to taking the square root of A(e? u ), and by 
extension, of A(z), factoring it into rational factors X(z) and X(z _1 )J^3 as a direct 
corollary of Theorem 12.131 



Corollary 2.14 (Spectral factorization) A rational z-transform A(z) is 
the deterministic autocorrelation of a stable real sequence x n if and only if it can 
be factored as A(z) = X(z)X(z~ 1 ). 



Spectral factorization amounts to assigning poles and zeros from quadruples and 
pairs ( |2.144aD -( ]2.144c] ) to X{z) and X(z~ 1 ). For the poles, there is a unique rule: 
take all poles inside the unit circle and assign them to X(z). This is because stability 
of a; requires X(z) to have only poles inside the unit circle (Proposition 1 2. 15] ), while 
x real requires that conjugate pairs be kept together. For the zeros, there is a choice, 
since we are not forced to only assign zeros inside the unit circle to X( z). Doing so, 
however, creates a unique solution called the minimum-phase solutionis It is now 
clear why it is important that the zeros on the unit circle appear in pairs: it allows 
for the assignment of one of each to X(z) and X(z~ l ). 

Example 2.22 (Spectral factorization) We now illustrate both the pro- 
cedure and how we can recognize a deterministic autocorrelation of a real and 
stable sequence (see Figure 12.141) : 

(i) The first sequence we examine is a finite-length, symmetric sequence a n 
with its associated ^-transform: 

a n = 25 n+ i + 55 n + 2<5„_i, 
A(z) = 5 + 2(z + z- 1 ) = {l + 2z- 1 )(l + 2z), 

depicted in Figure 2.14( a) . This sequence is a deterministic autocorrelation 
since it has two zeros and they appear in a pair as per Theorem 12.131 As 
we said above, we have a choice whether to assign — A or —2 to X(z); the 
minimum-phase solution assigns — A to X(z) and —2 = (— 5) -1 to X(z^ 1 ). 



52 Note that in fl2,102[ ) we could have written the elements below the diagonal, for example, 
Cq 1 (e Jt "), as Co,i»(e _J ^) to parallel the z-transform. Here, subscript * would just mean conju- 
gation of coefficients, as conjugation of e-'" is taken care of by negation. 

53 Note that since A(e^) is real and nonnegative, one could write X(e^) = y/A(ei"). However, 
such a spectral root will in general not be rational. 

54 The name stems from the fact that among the various solutions, this one will create a minimal 
delay, or, that the sequence is most concentrated towards the origin of time. 
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(ii) The second sequence is an infinite-length, symmetric sequence a n with its 
associated z-transform: 



A(z) 



(±)"u n + 2 n u_ n _ 1) 

1 1 (3/2)z- 1 



\-\z~ 1 1-2Z- 1 ~ (i_ i z -i)(i_2z-i)' 

depicted in Figure [2. 14( b). Above, we have used the z-transform pairs from 
Table 12.61 This sequence is a deterministic autocorrelation since it has two 
poles and they appear in a pair as per Theorem 2.131 We now have no 
choice but to assign \ to X(z), as for a stable sequence all its poles must 
be inside the unit circle; the other pole, 2 = (2) -1 goes to X{z~ 1 ). 
(iii) Finally, we examine the following finite-length, symmetric sequence a n with 
its associated z-transform: 

a n = 25 n+ i + 76„ + 7<$„_i + 25 n -2, 

A(z) = 7(l + z- 1 ) + 2{z + z- 2 ) = {l + \z- 1 ){l + 2z- 1 ){l + z- l ) 1 



depicted in Figure 2.14( c) . This sequence is not a deterministic autocorre- 
lation since it has three zeros, two appearing in a pair as in Part (i) and the 
third, a single zero on the unit circle, violating Theorem 2.131 The DTFT 
of a n is not real, for example, A(e- 77r ' 2 ) = — (5/2)(l + j). 



2.5.4 ^-Transform of Filters 

For filters, 

H{z) = ^h n z- n (2.150) 

nez 
is the counterpart of the frequency response in (2. 105a) ; it is well defined for values 
of z for which h n z~ n is absolutely summable. As mentioned previously, there is a 
one-to-one relationship between a rational z-transform and a realizable difference 
equation (one with a finite number of coefficients). After revisiting this relationship, 
we establish a necessary and sufficient condition for stability of causal systems with 
rational transfer functions. 

Difference Equations with Finite Number of Coefficients Consider a causal so- 
lution of a difference equation with a finite number of terms as in ( 12.54) with zero 
initial conditions. Assuming x and y have well-defined z-transforms X(z) and Y(z), 
and using that x n ^k and z~ k X{z) are a z-transform pair, we can rewrite (2.54) as 

Y(z) = (^b k z- k \x{z)-[Y j a k z- k \Y{z). 



The transfer function is given by 






H(z) = 4^T = : ^f ., (2-151) 
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a n 














-3 -2-10 1 2 3 



a„ 



-3-2-10 1 2 3 

(c) 







a„ 








7 




























-3 


-2 


1 1 


3 




(c) 



Figure 2.14: Pole/zero locations of rational autocorrelations, (a) Finite- length, sym- 
metric sequence that is an autocorrelation and its (b) zero locations, (c) Infinite-length, 
symmetric sequence that is a deterministic autocorrelation and its (d) pole locations, (e) 
Finite-length, symmetric sequence that is not a deterministic autocorrelation and its (f) 
zero locations. 



That is, a linear discrete-time system satisfying difference equation ( 12.54) has a 
rational transfer function H(z) in the ^-transform domain; in other words, the 
z-transform of the impulse response of the system is a rational function. 

Example 2.23 (Rational transfer function) Consider the simple system 
in Figure 12.151 The constraint at the summing node gives 

y n = x n + &ix„-i - aij/n-i - a 2 y n -2- 
By using z-transform properties, this implies 

Y(z) = X{z) + b 1 z- 1 X{z)-a 1 z- 1 Y{z)-a 2 z- 2 Y{z). 
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Figure 2.15: A simple discrete-time system, where z 1 stands for a unit delay. 



The system transfer function is therefore 

1- 



H(z) 



tiz- 



1 + a\z x + a,2Z 2 
We now discuss stability for systems with rational transfer functions. 



Proposition 2.15 (Stability) A causal, LSI discrete-time system with a ra- 
tional transfer function is BIBO stable if and only if the poles of its (reduced) 
transfer function are inside the unit circle. 



Proof. Using the partial fraction inversion method described in Section 2.5.21 the im- 
pulse response of a causal, LSI discrete-time system with rational transfer function is 
a linear combination of right-sided geometric sequences as in ( |2.129b| ) — possibly with 
multiplication by n k factors (stemming from multiplicities of poles) and with additional 
terms that are shifted Kronecker delta sequences (from the numerator having higher 
degree than the denominator as in (2.130b] )). When each pole is inside the unit circle, 
each term in this linear combination is absolutely summable, so the impulse response 
is absolutely summable as well; thus, according to Proposition 2,8 1 the system is BIBO 
stable. Conversely, if any pole is outside the unit circle, the impulse response is not 
absolutely summable; thus, according to Proposition ^. 81 the system is not BIBO stable. 



Filters 

A major application of the z-transform is in the analysis and design of filters. With 
the restriction to rational functions for realizability, designing a desirable filter is 
essentially the problem of strategically placing poles and zeros in the z-plane. As 
simple as this may sound, filter design is a rather sophisticated problem, and it 
has led to a vast literature and numerous numerical procedures. For example, a 
standard way to design FIR filters is to use a numerical optimization procedure 
such as the Parks-McClellan algorithm, which iteratively modifies coefficients so 
as to approach a desired frequency response. Rather than embarking on a tour of 
filter design, we study properties of certain classes of filters; pointers to filter design 
techniques are given in Further Reading. 
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FIR Filters The ^-transform of a length-L FIR filter is a polynomial in z , 

L-l 
H{ Z ) = Y, h r. 
n= 

and is given in its factored form as ( 1 2 . 1 2 6 [ ) 



z~ n 

n=0 



Linear- Phase Filters In z-domain, the symmetries from ( ]2.110j ) become: 

f mmet ™ 3> H(z) = z~ L +^H{z-% (2.152a) 

antisymmetric z^ H{z) = _ j? _ i+ l ff(j? -l )) (2152b) 

In z-domain, H{z~~ l ) reverses the filter, z~ L+1 makes it causal again, and ± deter- 
mines the type of symmetry. 

Allpass Filters The basic single-zero/single-pole allpass building block given in 
( 12.1151 ) has the z-transform 



H(z) = -, (2.153) 

1 — az 1 

with the zero 1/a* and pole a. For stability in the causal case, \a\ < 1 is required. 
A more general allpass filter is formed by cascading these elementary building blocks 
as 

f = i 1-anz 1 B(z) 

where B(z) = Hj_i(l — o^z" 1 ). The z-transform of the deterministic autocorrela- 
tion of such an allpass filter is given by 

N -1 * n 

Z 01; -r-r Z — Cti 



A( Z ) = H(z W z-i) = nfr^n T 



and thus, an allpass filter has the deterministic autocorrelation sequence a m = 5 m . 
Poles and zeros appear in pairs as {a, 1/a*} = {roe J "°, (l/rQ)e JUJ °} for some real 
vq 6 (0, 1) and angle loq. They appear across the unit circle at the same angle and 
at reciprocal magnitudes, and thus the magnitude \H(e :,l * J )\ is not influenced while 
the phase is, as was shown in Figure [2.121 

2.6 Discrete Fourier Transform 

As mentioned previously in this chapter, one way in which a finite-length sequence 
arises is as one period of an infinite-length periodic sequence. The version of the 
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Fourier transform designed for finite-length sequences treats all finite-length se- 
quences this way, so effectively we are circularly extending any finite-length se- 
quences. As we have seen in Section 12.3.31 the circular convolution operator ( 12.690 
is the appropriate description of LSI systems operating on finite-length inputs, cir- 
cularly extended. 

The version of the Fourier transform for this combination of sequence space 
and convolution is the discrete Fourier transform. Similarly to our discussion on 
eigensequences of the linear convolution operator leading to the definition of the 
DTFT, we will find appropriate eigensequences of the circular convolution operator 
leading to the DFT. As expected from this construction, the DFT diagonalizes the 
circular convolution operator. 

One of the most important uses of the DFT and circular convolution arises 
from their connections with the DTFT and linear convolution. Specifically, we will 
see that the DFT is a tool for fast computation of linear convolutions. 

2.6.1 Definition of the DFT 

Eigensequences of the Circular Convolution Operator Given that we have an 
appropriate convolution operator defined (circular convolution in ( 12.69D ), mimicking 
what we did for the DTFT, the DFT arises from identifying the eigensequences of 
the convolution operator. We can guess that the eigensequences are of unit-modulus 
complex exponential form v n = e Jujn like in Section [2.4.11 In addition, since we will 
represent sequences of period N with these eigensequences, we can guess that the 
eigensequences should be periodic with period N as well. Due to the periodicity, 

v n+N = e ^("+ JV ) = v n =► e^ N = 1 =► u = ^rk, (2.155) 

for k G Z. Since k and k + £N lead to the same complex exponential sequence, 
we have N distinct complex exponential sequences of period N, indexed by k € 
{0, 1, . . . , N - 1} instead ofuel: 



aJ (2,/N)kn = w -*n j v = ^ W ~k ^-(AT-Dfcj T ^ ^ ^^ 



where W N = e' j2w/N (for details, see ( 12.2760 in Appendix [2A3] ) . 

Let us check that these are indeed eigensequences of the circular convolution 



operator H from 


(12.690: 
















JV-l 






N~l 




(Hv) n = 


(h ® V) n -- 


= E u ( 


n—i) 


mod N "*i — 


E 






JV-1 




N-l 








(0) 


E w N n ~ 


- %) h t = 


E 

i=0 




n 


where 


(a) follows from the fact that if 


■(n- 


■ i) =£N+p 


, the] 


w [ t 


-i) mod N] 


W%, but also W^~ 


-i) = 


W (w +P) = 


w p N 



w H(n- t ) mod N] ^ 



(2.157) 



(n — i) mod N = p and 
Thus indeed, applying 
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the convolution operator H to the complex exponential sequence v results in the 
same sequence, albeit scaled by the corresponding eigenvalue X k . As before, we call 
that eigenvalue the frequency response of the system H k ; it is defined formally in 
( 12. 176a) . We can thus rewrite ( 12.157J ) as 

HW N kn = h®W^ kn = H k W^ kn . (2.158) 

Each of the N distinct eigensequences generates the space Sk = {ae 1 ' 2 *'"''" | a G 
C}. The quantity k is called discrete frequency. 

DFT Finding the appropriate Fourier transform now amounts to projecting onto 
each of the invariant subspaces Sk- 



Definition 2.16 (Discrete Fourier transform) The discrete 


Fourier 


transform of a length- 7V sequence x is 




N-l 

X k = (Fx) k = ^XnWfr, ke {0,1,..., N-l}; 

n=0 


(2.159a) 


we call it the spectrum of x. The inverse DFT of a length- N sequence 


X is 


1 1 N ~ X 
x n = jj(F*X) k = -Y^X k W^ kn , ne{0,l,...,N-l}. 


(2.159b) 


fe=0 




We denote the DFT pair as 




DFT 
X n < > A-k- 





Within the definition, we have introduced F : C N — > C N to represent the linear 
DFT operator. The relationship between the inverse and the adjoint in ( j2.159bj ) is 
verified shortly. 

DFT as an Orthonormal Basis If we define 



it is easy to see from ( |2.159aj )-( |2.159b| l, as well as from orthogonality of v k (easily 
checked using (2.277c) ). that the set {<fik} k =o forms an orthonormal basis for C^. 

Matrix View We know how to form a matrix given an orthonormal basis by stack- 
ing the basis vectors as its columns; however, as this is the community standard, 
the scaled basis vectors would go into the matrix F* and not F . Thus, the DFT 
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operator F and its inverse F 1 are given by: 

1 



1 
N 



1 

W N 



1 w, 



(JV-1) 



JV 



Wm 1 



W 



JV 



w 2 N 



w, 



2(N-1) 



N 



w 



N 



A 



i w- {N ~ l) W~ 2iN ~ 1} 



w 



w 



JV-1 
JV 
2(JV-1) 

N 



w, 



(JV-1) 2 



A 



IT" 
W, 



JV 



A 



w 



A 



1 




-(JV- 


-1) 


2(JV 


-1) 


(JV- 


I) 2 



(2.161a) 



N 



F*. (2.161b) 



This shows once more that the DFT is a unitary operator (up to a scaling factor) 
Note that F is a Vandermonde matrix (see (1.230) ), and its determinant is 



o5 



det( ^ = 7!f> 



det^ 1 ) 



N. 



det(F* 



N N+l/2^ 



(2.162) 



Relation of the DFT to the DTFT When given a length- N sequence x to analyze, 
we might first turn to what we already have — the DTFT. To turn it into a tool 
for analyzing length-jV sequences, we can simply sample it. As we need only jV 
points, we can choose them anywhere; let us choose TV uniformly spaced points in 
frequency given by (2.155) . Then, evaluate the DTFT at these N distinct points: 



(o) 



JV-1 



J2 *n e-^' N ^ kn = X k 



n=0 



where (a) follows from sampling the DTFT uniformly at u> = 2irk/N; (b) from the 
expression for the DTFT (2.78a) ; and (c) from x being finite of length N. The final 
expression is the one for the DFT we have seen in (2.159a) . Thus, sampling the 
DTFT results in the DFT. 

We can now easily find the appropriate convolution operator corresponding 
to the DFT (while we know what it is already, we reverse engineer it here). The 
way to do it is to ask ourselves which operator will have as its eigensequences the 
columns of the DFT matrix F* (since then, that operator is diagonalized by the 
DFT). In other words, which operator H will satisfy the following: 



Hv k = X k v k 



HF* 



F*A 



H 



1 

N' 



-F*AF. 



J A normalized version uses a 1/vJV on both F and its inverse. 
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Even though we do not know A, we know it is a diagonal matrix of eigenvalues X k - 
Using the expressions ( |2.161aj )-( l2.161bl for F and F* , we can find the expression 
for the element -ff^ m as 

N-l 
1=0 

which shows H to be a circulant matrix (circular convolution). We leave the details 
of the derivation as exercise. 

What we have seen in this short account is how to reverse engineer the circular 
convolution given a finite- length sequence and the DFT. 

2.6.2 Properties of the DFT 

We list here the basic properties of the DFT; Table [2~T7| summarizes these, together 
with symmetries as well as standard transform pairs, while Exercise 12.191 explores 
proofs for some of the properties. 

Linearity The DFT operator F is a linear operator, or, 

ax n +(5v n i^A aX k + (3Y k . (2.163) 

Shift in Time The DFT pair corresponding to a shift in time by no is 

X {n - no ) mod N ^ W$» X h . (2.164) 

Shift in Frequency The DFT pair corresponding to a shift in frequency by k$ is 

W N ° n x n < — > -^(fc-feo) modiV- (2.165) 

As for the DTFT, a shift in frequency is often referred to as modulation in time, 
and is dual to the shift in time. 

Time Reversal The DFT pair corresponding to a time reversal x- n mo d n is 

X-nmodN < > X-kmod N', (2.166) 



the proof is given in Exercise 12.191 

Convolution in Time The DFT pair corresponding to convolution in time is 

{h®x) n £H H k X k . (2.167) 

As this is a central concept in signal processing, it is worth repeating the following: 
(1) given a finite-length sequence and a finite-length filter, their linear convolution 
can be computed using a circular convolution of appropriate length as in Proposi- 
tion |2.1()l and (2) the DFT operator F diagonalizes the circular convolution operator 
H as in pT77j ). 
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DFT properties 



Time domain 



DFT domain 



Basic properties 

Linearity 
Shift in time 
Shift in frequency 
Time reversal 
Convolution in time 
Convolution in frequency 

Deterministic autocorrelation 
Deterministic crosscorrelation 
Parseval's equality 



ax n + pyn 


aX k + /3Y k 


•''(n-nQ) mod N 


w k N n °x k 


W~ kan x n 


X(k-ko) mod Af 


■£ — n mod N 


X — k mod N 


(h ®x)„ 


H k X k 


fin Xn 


(l/N)(H®X) k 


Af-l 




a n = y J x k x (k _ n) mod N 


A k = \X k \ 2 


k=a 




Af-l 




Cn = Z_^ Xk y(k-n) mod JV 


C k = X k Y k 


fc = 




Af-l Af-l 





imi 2 = E m 2 = (Vjv) E i x *i 2 = (Vjv) 11*11 



Symmetries 

Conjugate 

Conjugate, time reversed 

Real part 

Imaginary part 

Conjugate-symmetric part 

Conjugate-antisymmetric part 

Symmetries for real x 

X conjugate symmetric 
Real part of X even 
Imaginary part of X odd 
Magnitude of X even 
Phase of X odd 

Common transform pairs 

Kronecker delta sequence 
Shift by no 
Exponential sequence 

Ideal lowpass filter 
Box sequence 



— n mod Af 
&(s„) 

S(lCn) 

(x„ + x_ n mod jv)/2 
(x n — x_ n mod N )/2j 



0(n — no) mod N 



I ko sine (irnko/N) 



N sine (nn/N) 
_J_ \ n - K\ > "o-i . 

0, otherwise. 



-^■-fc mod N 






*l 






{X k +X*_ k 


in o d 


Af)/2 


(X k - x*_ k 


in o d 


Af)/2j 


»U) 






S(Xfc) 







-Xfc = X_ k mod jv 
SR(-X'fe) = SR(-X'-fc mod Af) 
S(-Xfc) = — ^(-X'-fc mod Af) 

l-Xfcl = l-X-fc mod Af I 

argX fc = - argX_ fe mod N 



(l-aW* N )/(l-aWh) 



fc ' l K 2 I ^ 2 ' 

0, otherwise, 

sine (nnok/N) 



sine (irk/N) 



Table 2.7: Properties of the DFT. 



Convolution in Frequency The DFT pair corresponding to convolution in fre- 
quency is 



. dft 1 , TT _., 

h„x n < — > —(H®X) k ; 



(2.168) 
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the proof is given in Exercise 2.191 As expected, convolution in frequency is dual 
to convolution in time. 

Deterministic Autocorrelation The DFT pair corresponding to the deterministic 
autocorrelation of a sequence x is 



iV-l 



E* DFT A IV I 2 

X k x (k~n) mod JV < ' A k — \^k\ ■ 



k=0 

for n, k = 0, 1, . . . , N — 1, and satisfies 

A k = A 

For a real x, 

Ak = \Xh 



— k mod jv- 



A. 



k mod N- 



(2.169) 



(2.170a) 



(2.170b) 



Deterministic Crosscorrelation The DFT pair corresponding to the deterministic 
crosscorrelation of sequences x and y is 



JV-l 

/ j X k V(k-n) mod JV 
k=0 



DFT 



Ck = Xk Y k * 



and satisfies 
For x, y real, 



^x,y,k — ^y lX ,-k mod N- 
*^x,y,k — *<*-k *k — ^y^x^ — k mod JV- 



(2.171) 

(2.172a) 
(2.172b) 



Deterministic Autocorrelation of Vector Sequences The DFT pair correspond- 
ing to the deterministic autocorrelation of a vector sequence x is 



A n < > Ak 



and satisfies 



A ,k 

°0,1, -fc mod JV 
0,JV-l,-fc mod JV 



C 



Ao,k Co,l,fc 

C\fl,k Ai t k 

Cjv-i,o,fe Cjv-i,i,fc 

Co,i,fe 
Ai, k 

°l,JV-l,-fc i 



For a real vector of sequences x, 



A 1 



od JV 



k mod JV • 



C*0, JV-l, k 
Ci,N-l,k 

An-i,h 



Co, JV-l, A; 
Ci,AT-i,fc 

^JV-l.fe 



(2.173) 



-k mod JV- 



(2.174a) 
(2.174b) 
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Parseval's Equality The DFT operator F is a unitary operator (within scaling) 
and thus preserves the Euclidean norm (see ( 1 1 . 5 1 J ) ) : 

AT-l JV-l 

iwi 2 = Em 2 = ^Ei^i 2 = ^ii^ii 2 = n\\ Fx \\ 2 - ( 2 - 175 ) 

n=0 fc=0 



This follows from F/vN being a unitary matrix since F* F = NI. 

2.6.3 Frequency Response of Filters 

As for the DTFT, the DFT is defined for sequences and we use spectrum to denote 
their DTFTs. The frequency response is defined for filters (systems) as 

N-i 
H k = ^h n Wk n , fce {0,1,..., JV-l}, (2.176a) 

71=0 

with the corresponding impulse response. 

1 N - 1 
K = -J2 H >< W N kn > ne{0,l,...,iV-l}. (2.176b) 

fc=0 

We can again denote the magnitude and phase as 

H k = \H k \e"**™, 

where \Hk\ is an TV-periodic real positive sequence — the magnitude response, and 
arg(iJfc) is an ./V-periodic, real sequence between and N — 1 — the phase response. 
A filter is said to possess linear phase when the phase response is linear in k. A 
filter is called bandlimited when its frequency response is finitely supported. 

Diagonalization of the Circular Convolution Operator Call A = diag([iJ/ £ ]) fe=0 1 , 
with Hk the DFT coefficients of h n as in ( j2.176aj ) . Then, from ( 12.157) , we can imme- 
diately see that the DFT operator F (2.161a) diagonalizes the circular convolution 
operator H (2773) as in (1.210a) , 

H = FAF- 1 . (2.177) 



We illustrate this diagonalization in Figure 12.16) and explore it further in Solved 
Exercise 12.51 

Ideal Filters TBD. 

FIR Filters TBD. 

Linear-Phase Filters TBD. 
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Figure 2.16: Diagonalization property of the DFT. 



Allpass Filters TBD. 

2.7 Multirate Sequences and Systems 

So far, we considered sequences in time indexed by integers, and the time index was 
assumed to be the same for all sequences. Physically, this is as if we had observed 
a physical process, and each term in the sequence corresponded to a sample of the 
process taken at regular intervals (for example, every second). 

In multirate sequence processing, different sequences may have different time 
scales. Thus, the index n of the sequence may refer to different physical times for 
different sequences. We might ask both why do that and how to go between these 
different scales. Let us look at a simple example. Start with a sequence x n and 
derive a downsampled sequence y n by dropping every other sample 



Vn 



X2n 



.!•() 



X 2 



(2.178) 



Clearly, if x n is the sample of a physical process taken at t = n, then y n is a sample 
taken at time t = 2n. In other words, y n has a timeline with intervals of 2 seconds 
if x„ has a timeline with intervals of 1 second; the clock of the process is twice as 
slow. The process above is called downsampling by 2, and while simple, it has a 
number properties we will study in detail. For example, it is irreversible; once we 
remove samples, we cannot go back from y n to x n . It is also shift varying, requiring 
more complicated analysis. 

The dual operation to downsampling is upsampling. For example, upsampling 
a sequence x n by 2 results in a new sequence y n by inserting zeros between every 
two samples as 



Vn 



x n/2i 

0. 



n even; 
n odd, 



x-i 



Xq 



si 



(2.179) 



The index of y n corresponds to a time that is half of that for x n . For example, if x n 
has an interval of 1 second between samples, then y n has intervals of 0.5 seconds; 
the clock of the process is twice as fast. 

What we just saw for rate changes by 2 can be done for any integer as well 
as rational rate changes (the latter ones by combining upsampling by N and down- 
sampling by M). In addition, to smooth the sequence before dropping samples, 
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downsampling is preceded by lowpass filtering, while to fill in the zeros, upsampling 
is followed by filtering, thus combining filtering with sampling rate changes. The 
multirate operations are used in any number of today's physical systems, from MP3 
players, to JPEG, MPEG, to name a few. 

The purpose of this section is to study these various multirate operations and 
their consequences on the resulting sequences and their spectra. These operations 
turn multirate systems into linear periodically shift-varying (LPSV) systems, repre- 
sented by block- Toeplitz instead of Toeplitz matrices. For example, in Section [2.3.1[ 
we encountered a block- averaging operator, ( 12.48J ) . which is linear but not shift in- 
variant; it is, however, LPSV. If the period is an integer N, then the system is 
really the superposition of N LSI systems, and this is used in their analysis. While 
an LSI system is specified by its impulse response corresponding to a Kronecker 
delta sequence at time 0, an LPSV system is specified by N impulse responses 
corresponding to Kronecker delta sequences at times n = 0, 1, . . . ,N — 1. 

While the LPSV nature of multirate systems does complicate analysis, we use 
a relatively simple and powerful tool called polyphase analysis to mediate the prob- 
lem and reduce it to the study of LSI systems. The outline of the section follows 
naturally the above introduction, moving from down- and upsampling together with 
filtering to polyphase analysis. Multirate processing, while not standard signal pro- 
cessing material, is central to filter bank and wavelet constructions. We summarize 
the main operations in multirate processing in Table 2.121 in Chapter at a Glance. 

2.7.1 Downsampling 

Downsampling by 2 Downsampling 56 ! by 2, as introduced in ( 12.1781) and shown in 
Figure [2. 17( a). is clearly not shift invariant. If the input is x n = S n , the output is 
Vn = S n ; if the input is x n = 5 n -i, the output is zero. However, if the input is x n = 
<5n-2fc, the output is y n = d n -k] this means the downsampler is an LPSV system, 
making all multirate systems involving downsampling LPSV. It is instructive to 
look at (12. 178[) in matrix notation: 









V-x 






2/o 






2/i 


= 




2/2 






2/3 













10 



10 

1 



















X-2 










X-i 




X-2 




X 


x 




X\ 


X2 






X2 




X4 






X3 




X 6 






X4 














- ' -1 









, (2.180a) 



V 



D 2 x, 



(2.180b) 



where D2 stands for the operator describing downsampling by 2. Inspection of D2 
shows that it is similar to an identity matrix, but with the odd rows taken out. 



'Downsampling is often referred to as subsampling and decimation as well. 
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(a) Block diagram. 



X(e>")\ \\{X{e^' 2 )+X{e^— 2 ^l 2 ))\ 





(b) Magnitude response of a sequence and its (c) downsampled version. 
Figure 2.17: Downsampling by 2. 



Intuitively, it is a rectangular operator, with the output space being a subspace of 
the input space It is a linear operator and has no inverse. 

To find the ^-transform Y(z), the downsampled sequence may be seen as a 
sequence Xo >n having the even samples of x n with the odd ones set to zero, followed 
by contraction of the sequence by removing those zeros. Thus, 



•''I),: 



x n , n even; 
0, n odd. 



Its ^-transform is 



X (z) = \[X{z) + X{-z)] 



| [(■•• + x + X\Z 1 + x 2 z 2 + ■ • • ) + (■ ■ ■ x 
(• ■ ■ + x- 2 z 2 + x + x 2 z~ 2 H ) = y^ x 2n z" 



x\z l + x 2 z 2 + ■ • • )] 



-2n 



canceling the odd powers of z and keeping the even ones. We now get Y{z) by 
contracting Xq(z) as: 



Y(z) = $> 2 „z-" = X (z^) = l[x{z^)+X{-z l ' 2 \ 



n£Z 



To find its DTFT, we simply evaluate Y(z) at z = e JU> : 

Y{e ]UJ ) = \ [x(e^ /2 )+X(e^- 27r )/ 2 )l 



(2.181) 



(2.182) 



where — e J "/ 2 can be written as e^ u 2»)/2 smce e j-* _ _\ With the help of 
Figure [2.171 we now analyze this formula. X(e^' 2 ) is a stretched version of J(e ;w ) 
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(by a factor of 2) and is 47r-periodic (since downsampling contracts time, it is 
natural that frequency expands accordingly). This is shown as the solid line in 
Figure [2T7l c). X(e^- 2 ^l 2 ) is not only a stretched version of X(e juJ ), but also 
shifted by 2ir, shown as the dashed line in Figure [2.17( c). The sum is again 27r- 
periodic, since Y{e^ UJ ) = Y(e^ u '~ 2 '). Both stretching and shifting create new 
frequencies, an unusual occurrence, since in LSI processing, no new frequencies 
ever appear. The shifted version X(e)^ u ~ 2 ' K >< 2 ) is called the aliased version of the 
original (a ghost image). 

Example 2.24 (Downsampling of sequences) Consider first a standard ex- 
ample illustrating the effect of downsampling: x n = (—1)™ = C0S7rn, the highest- 
frequency discrete sequence. Its downsampled version is y n = xi n = cos 27m = 1, 
the lowest-frequency discrete sequence (a constant): 

[... 1-10-11 ...]" *> [... 11011 

changing the nature of the sequence. 

Consider now the right-sided geometric series sequence, x n = a n u n , from 
( 12. 124a) , with the z-transform (from Table 12.61 ) 

X(z) ' 



1 — az 
Its downsampled version is 



In \ ) nn 

Vn = x 2n = a u 2n = p U n , 

where (a) follows from u-m = Un and (3 = a 2 , with the z-transform (from Ta- 
ble ELS} 

Y(z) = - V^T' 

1 — a z z i 

which we could have also obtained using the expression for downsampling ( 12.181) 

1 1 \ 1 



2\l-az 1 / 2 1 + az- 1 / 2 ) \-a 2 Z -^ 

The downsampled sequence is again exponential, but decays faster. 

Downsampling by N We can generalize the discussion of downsampling by 2 to 
any integer N . The downsampled- by- N sequence y n and its z-transform pair are 

1 N ~ 1 
Vn = x Nn ^ Y(z) = - J^ X{W%z x ' N ). (2.183) 

fe=0 



The corresponding DTFT pair on the unit circle is 

N-i 

N 



1 N ~ x 
yn = XNn D ^J Y(en = -L J2 X(e^- 2 ^/ N ), (2.184) 



k=0 
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using 



W*z 1,N 



e - 3 (2ir/N)k e Ju/N 



J{u-2-Kk)/N 



We have already seen these expressions in Section 2.4.3 and Table 12.41 as scaling in 
time. The proof is an extension of the N = 2 case, and we leave it as Exercise 12.201 

2.7.2 Upsampling 



Upsampling by 2 Upsampling by 2 as introduced in ( 12.1791 ) and shown in Fig- 
ure [2718(a) stretches time by a factor of 2. In matrix notation, similarly to (2.180a[ ): 







V-2 




V-i 




2/o 




2/i 


= 


2/2 




2/3 




2/4 









10 



o|T|oo 



10 



1 





X-l 


= 


X-l 




x 


x 


X2 
X3 




Xl 



X2 



(2.185a) 



2/ = U 2 x, 



(2.185b) 



where TJ% stands for the upsampling-by-2 operator. The matrix U 2 looks like an 
identity matrix with rows of zeros in between every two rows. Another way to look 
at it is as an identity matrix with every other column removed. 

In the ^-transform domain, the expression for upsampling by 2 is 



Y(Z) = Y,VnZ~ n 



/ %nZ 



X(z 2 ), 



(2.186) 



since all odd terms of y n are zero, while the even ones are y 2n = x n following ( 12.179) . 
In frequency domain, 

Y(e jw ) = X(e 32uJ ), (2.187) 



a contraction by a factor of 2 as shown in Figure [2.18( b) and (c). 

Example 2.25 Take the constant sequence x n = 1. Its upsampled version is 
j/n = . . . 1 1 1 ... , and can be written as 



V-n 



\ [1 + (-!)"] = i(cos27rn + cos7rn), 



indicating that it contains both the original frequency (constant, DC value at 
the origin) and a new high frequency (at u> = n, since (—1)™ = e J ^ n = cos7m). 
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(a) Block diagram. 

X(e^)\ |X(e 2ju 





(b) Magnitude response of a sequence and its (c) upsampled version. 
Figure 2.18: Upsampling by 2. 



Upsampling by N We can generalize the discussion of upsampling by 2 to any 
integer N. A sequence upsampled by N and its ^-transform pair are given by 



Vr, 



X n /N, n = £N; 
0, otherwise, 



ZT 



Y(z) = X(z») 



The corresponding DTFT pair on the unit circle is 



€ n/N, 



n = £N; 
otherwise, 



DTFT 



Y(e? u ) = X(e 3N0J ) 



(2.188) 



(2.189) 



2.7.3 Downsampling and Upsampling 

The downsampling and upsampling operators are transposes of each other; since 
they have real entries, the upsampling operator is the adjoint of the downsampling 
operator, 



U 2 



Dl 



D* 



(2.190) 



Upsampling Followed by Downsampling What about combinations of upsampling 
and downsampling? Clearly, upsampling by 2 followed by downsampling by 2 results 
in the identity 

D 2 U 2 = I, (2.191) 

since the zeros added by upsampling are at odd-indexed locations and are subse- 
quently eliminated by downsampling. Similarly, 



U N D 



N 
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Downsampling Followed by Upsampling The operations in the reverse order is 
more interesting; downsampling by 2 followed by upsampling by 2 results in a 
sequence where all odd-indexed samples have been replaced by zeros, or 

r IT 21, 2T r lT 

[. . . X-l Xq X\ X 2 ■ ■ -\ ► [■ ■ ■ X X2 ■ ■ -\ 

This operator, 

P = U 2 D 2 , (2.192) 

is an orthogonal projection operator onto the subspace of all even-indexed samples. 
To verify this, we check idempotency (for the operator to be a projection, see 
Definition [L27] ), 

P 2 = {U 2 D 2 ){U 2 D 2 ) = U 2 {D 2 U 2 )D 2 = U 2 D 2 = P, 



using Q2.191J ), as well as self-adjointness (for the operator to be an orthogonal pro- 
jection, see Definition 11. 27) . 

p* = (u 2 D 2 y = d;u; =u 2 d 2 = p, 

using (2.1901 ). Similarly, 

P>n Un = Pn , 

where Pn is an orthogonal projection operator. 

Applying this projection operator to a sequence x n , we get the expressions in 
the DTFT and ^-transform domain as: 

U & l(X(z)+X(-z)) ( ^ 

2.7.4 Downsampling, Upsampling and Filtering 

During both downsampling and upsampling processes, new frequencies appear, and 
it is often desirable to filter them out. While lowpass filtering is usually applied 
before downsampling to avoid aliasing (confusion of frequencies due to overlapping 
spectra), filtering is usually applied to fill in the zeros between the samples of the 
original sequence to smooth the output sequence. We now consider these two cases 
in more detail. 



Filtering Followed by Downsampling Consider filtering followed by downsampling 
by 2, as illustrated in Figure [2.191 For simplicity, we look at a causal, length-L FIR 
filter 5@ with operator G as in ( 11.2281 ) ( I1.229J ), or (2T63J Then, convolution with 



57 From this point on, we use g to denote a filter when followed by a downsampler; similarly, we 
use g to denote a filter when preceded by an upsampler. This is done because in Part II of the 
book, this will be standard notation. 
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the filter g n followed by downsampling by 2 can be written as y = D2GX, 



- 






y-i 






Vo 
Vi 


= 




2/2 






- • - 







9i 9o 

33 32 3i 9o_ 

<7 3 <7 2 3i 3o 

g 3 52 







X-l 




Xq 




X2 



D 2 Gx; (2.194) 



D 2 G 

the above is nothing else but the convolution operator G with the odd rows removed. 
From this matrix- vector product, we can also write the filtered and downsampled 
sequence using inner products using ( 12. 61a) as 

h-\ L-l 

Vn = ^ 9kX2n-k = (X2n-k,9k)k = J^ X k g 2 n-k = (<?2n-k, %k)k- (2.195) 
fc=0 fe=0 

In the ^-transform domain, apply ( 12.181) to G(z)X(z), 

Y{z) = \ \G{z 1 / 2 )X{z 1 ' 2 ) + G{-z 1 ! 2 )X{-z 1 ' 2 )\ , (2.196a) 

and, on the unit circle for the DTFT, 

Y{e ju ) = \ \G(eJ UJ / 2 )X{e juj/2 ) + G{e j{uJ - 27r)/2 )X(e j(uJ - 27r)/2 )] . (2.196b) 

Figure 2.191 shows an input spectrum and its downsampled version. Figure 2.17( c) 
shows the spectrum without filtering, while Figure [2. 19( c) shows the spectrum with 
filtering. Thus, when no filtering is used, aliasing perturbs the spectrum. When 
ideal lowpass filtering is used (Figure [2.19( b)), the spectrum from —ir/2 to 7t/2 is 
conserved, the rest is put to zero so that no aliasing occurs, and the central lowpass 
part of the spectrum is conserved in the downsampled version (Figure [2.19( c)). 

Example 2.26 (Filtering followed by downsampling) Consider the 2-point 
averaging filter g n = i(5„ +S n -i), whose output, downsampled by 2, y = D 2 Gx, 



is 



Vn 



h(x2n +X2n-l)- 



Because of filtering, all input samples influence the output, as opposed to down- 
sampling without filtering, where the odd-indexed samples had no impact. 

Upsampling Followed by Filtering Consider now upsampling followed by filtering, 
as shown in Figure [2.201 Using the matrix- vector notation, we can write the output 
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(a) Block diagram. 



\Y(en\ 




(b) Filtered sequence. (c) Downsampled sequence. 

Figure 2.19: Filtering and downsampling. 



as the product GU2, where G is a banded Toeplitz operator just like G: 











V-i 








Vo 









2/i 








2/2 

















92 go 

93 9i 
92 

93 







</o 



9i 

92 

93 







9o 
9i 



X-l 



Xq 



X\ 

X-2 



GU 2 x: 



GU 2 



(2.197) 



again, this is nothing else but the convolution operator G with the odd columns 
removed. Using inner products, we can express y n as 



Vn 



2__ l 9n-2kXk 

k 



(Xk, 9n- 



■2k)k- 



(2.198) 



Another way to look at ( 12.197f )— ( ]2.198[ ) is to see that each input sample Xk generates 
an impulse response g n delayed by 2fc samples and weighted by Xk- In the z- 
transform and DTFT domains, the output of upsampling followed by filtering is 



Y(z) = G(z)X(z 2 ), 
Y(e juJ ) = G{e^)X{e^ u ). 



(2.199a) 
(2.199b) 



Figure [2.201 shows a spectrum, its upsampled version, and finally, its ideally filtered 
version. We see that the ghost spectrum at 27r is removed, and only the base 
spectrum around the origin remains. 
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(a) Block diagram. 
X{e> u )\ \Y{e iu 





(b) Original sequence. (c) Upsampled and filtered sequence. 

Figure 2.20: Upsampling and filtering. 



Example 2.27 (Upsampling and filtering) Consider the piecewise constant 
filter g n with impulse response g n = S n + 5 n -\. The sequence x n , upsampled by 
2 and filtered with g n leads to 



lln 



X-l X-l Xq Xq X\ X\ 



(2.200) 



a staircase sequence, with stairs of height x n and length 2. A smoother interpo- 
lation is obtained with a linear interpolator: 



From ( 12.1971 ) or ( 12. 1981 ) , the even-indexed outputs are equal to input samples (at 
half the index), while odd-indexed outputs are averages of two input samples, 



x n / 2 , " even; 

\ (5(™+i)/2 + X(„_i)/ 2 ) n odd. 

. X-l \{x-i + Xq) Xq |(a;o + xi) X\ 



(2.201) 



Compare ( 12.2011 ) with ( 12.2001 ) to see why ( 12.201) is a smoother interpolation, and 
see Figure 12.21 for an example. 



Downsampling, Upsampling and Filtering Earlier, we noted the duality of down- 
sampling and upsampling, made explicit in the adjoint relation (2.190) . What hap- 
pens when filtering is involved, when is D2G the adjoint of GU-p- By inspection, 
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2 
















1 








5 | lb 15 







(a) Sequence x. 



(b) Upsampled version. 




(c) Piecewise-constant interpolation. (d) Linear interpolation. 

Figure 2.21: Upsampling and filtering. 



this holds when g* n = 5_„, since then 58 ! 



(D 2 G)* 



5-2 5o 52 54 56 
5-3 5-i 5i 53 55 

5-4 5-2 



5o 



52 54 



5-5 5-3 5-1 5i 53 
5-6 5-4 5-2 5o 52 



52 5o 5-2 5-4 5-6 

53 5i 5-1 5-3 5-5 

54 52 



.'/i i 



55 53 5i 

56 54 52 



5-2 5-4 

5-1 5-3 
5o 5-2 



GUi 



(2.202) 



We could also show the above by using the definition of the adjoint operator in 
( ]1.44j ). That is, we want to find a G so that D^G and GUi are adjoints of each 



'Note that unlike in (12.1941) and (12.1971) . here we do not assume causal filters. 
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other, 

(D 2 Gx,y) = (x,GU 2 y). (2.203) 

We thus write 

{D 2 Gx, y) ( => {Gx, D* 2 y) ( => (Gx, U 2 y) { ^ (x, G*U 2 y), 

where (a) and (c) follow from conjugate linearity of the inner product in the second 
argument; and (b) follows from ( 12.190) . Then ( 12.2031 ) will hold only if G* = G, 
that is, g* n = #_„. 

The above Hermitian transposition equals time-reversal will appear promi- 
nently in our analysis of orthogonal filter banks in Chapter \7\ In particular, we 
will prove that when the impulse response of the filter g n is orthogonal to its even 
shifts as in ( J2.204) , and <?* = <?- n , then the operation of filtering, downsampling by 
2, upsampling by 2 and filtering as in ( 17.18) is an orthogonal projection onto the 
subspace spanned by g n and its even shifts. 

2.7.5 Multirate Identities 

Orthogonality of Filter's Impulse Response to its Even Shifts Filters that have 
impulse responses orthogonal to their even shifts 

(g n , g n -2k) = S k , (2.204) 

will play an important role in the analysis of filter banks. Geometrically, ( 12.204) 
means that the columns of GU 2 in ( 12.197) are orthonormal to each other (similarly 
for the rows of D 2 G), that is, 

/ = (GU 2 )*(GU 2 ) = XJ* 2 G*GU 2 = D 2 G*GU 2 . (2.205) 

While we have written the above in its most general form using Hermitian trans- 
position to allow for complex filters, most of the time, we will be dealing with real 
filters, and thus simple transposition. 

We can see ( 12.204) as the deterministic autocorrelation of g downsampled by 
2. Write the deterministic autocorrelation of g as in ( 12.16) 

a k = (g n , 9n-k)m 

and note that it has a single nonzero even term, go = 1, 

a 2k = 5 k . (2.206) 

Assume now a real g, in the z-transform domain, A(z) = G(z)G(z~ 1 ) using ( J2.142) . 
Keeping only the even terms can be accomplished by adding A(z) and A(—z) and 
dividing by 2. Therefore, ( 12.206) can be expressed as 

A{z) + A(-z) = G{z)G{z~ 1 ) + G{-z)G{-z~ 1 ) = 2, (2.207) 
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which on the unit circle leads to 

|Gf(e*")| a + G(e j(aj+ ^) 2 = 2. (2.208) 



This quadrature mirror formula, also called power complementarity, will be central 
in the design of orthonormal filter banks in Chapter \7\ In the above we have assumed 
that g n is real, and used both 

G{z)G{z- 1 )\ z=eJu> = G(e^)G(e- ju ) = G{e ju )G*{e iw ) = |G(e^)| 2 , 

as well as 

G(-z)G(-z- 1 )\ z ^ = |G(e^ +7r) ) 12 

In summary, a real filter satisfying any of the conditions below is called orthogonal: 

Matrix View n r^T r^jj j 

< ► L>l(jr LrU2 = 1 

(gn,g n -2k) = h ^ G(z)G(z- 1 ) + G{-z)G{-z- 1 ) = 2 (2.209) 

^ T |G(e^)| 2 +|G(e^ +7r ))| 2 = 2 

Compare this to the expression for the allpass filter in ( |2.114| ) ; that allpass impulse 
response h is orthogonal to all its shifts, at the expense of no frequency selectivity. 
On the other hand, here we have a basis containing only even shifts; however, some 
frequency selectivity exists (<? is a halfband lowpass filter). These issues will be 
explored in more details in Chapter [71 

Interchange of Multirate Operations and Filtering The first of the two identi- 
ties states that downsampling by 2 followed by filtering with G(z) is equivalent to 
filtering with G(z 2 ) followed by downsampling by 2, as shown in Figure [2.22( a). 

The second identity states that filtering with G(z) followed by upsampling 
by 2 is equivalent to upsampling by 2 followed by filtering with G(z 2 ), shown in 
Figure [2.22( b) . The proof of the these identities is left as Exercise 12.231 and both 
results generalize to sampling rate changes by N. 

Commutativity of Upsampling and Downsampling Up/downsampling by the 
same integer do not commute, since, as we have seen, U2 followed by D2 is the 
identity, while D<i followed by U% is a projection onto the subspace of even-indexed 
samples. Interestingly, upsampling by N and downsampling by M commute when 
N and M are relatively prime (that is, they have no common factor). The proof 
is the topic of Exercise 2.261 and a couple of illustrative examples are given in 
Exercise [2.271 

Example 2.28 (Commutativity of upsampling and downsampling) We look 
at upsampling by 2 and downsampling by 3. If we apply U% to x n 

Ui x = [. . . xq x\ X2 X3 . . .1 , 
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(a) Filtering and downsampling. 




(b) Upsampling and filtering. 
Figure 2.22: Interchange of multirate operations and filtering. 



x_3 Xq £3 Xq 



followed by D 3 , we get 

D 3 U 2 x = [ 
while, applying D 3 first 

D 3 x = [... x_ 3 x x 3 x 6 ...] , 
followed by U<i leads to the same result, UiD 3 x = D 3 Uix. 

2.1.6 Polyphase Representation 

Multirate processing brings a major twist to signal processing: shift invariance is 
replaced by periodic shift variance, represented by block- Toeplitz matrices. This 
section examines a tool, called polyphase representation, that deals with such pe- 
riodic shift variance. It is a key method to transform single-input single-output 
linear periodically shift varying systems into multiple-input multiple-output linear 
shift-invariant systems. For simplicity, we introduce all of the concepts for down- 
sampling/upsampling by 2, and generalize to N only at the end of the section. 

Polyphase Representation of Sequences A convenient way to express shift vari- 
ance of period 2 is to split the input x n into its even- and odd-indexed parts: 



«£0,n — %2n 

Xl,n = £2n+l 

#0,71/2 ) 



X 



ZT 



ZT 



ZT 



X 0(Z) = Y,neZ X 2nZ ™, 
X l( Z ) = Y.n€Z X 2n+lZ- n , 

X (z 2 ) + z-'X^z 2 ). 



X(z) 



n = 2k 

a; l.(n-l)/2, n=2fc+l 

(2.210) 
In the above, xq is the even subsequence of x downsampled by 2 and X\ is the odd 
subsequence of x downsampled by 2: 

X = [... X-2 ^Xo\ x 2 x 4 . . .] , 

xi = [... X-i [x7] x 3 x 5 ...] . 
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Figure 2.23: Forward and inverse polyphase transform. 



This is illustrated in Figure 2.23[ to get the even subsequence, we simply remove 
the odd samples from x, while to get the odd subsequence, we shift x by one to the 
left (advance by one represented by z) and then remove the odd samples. To get 
the original sequence back, we revert the process: we upsample each subsequence 
by 2, shift the odd one by one to the right (delay by one represented by z~ l ), and 
sum up. This very simple transform that separates x into Xq and X\ is called the 
forward polyphase transform, with xq and X\ the polyphase components of x. The 
inverse polyphase transform simply upsamples the two polyphase components and 
puts them back together by interleaving, as in ( |2.210[ ). 

For example, in polyphase transform domain, the effect of the operator Y = 
U2D2X is very simple: Xq(z) is kept while X\(z) is zeroed out: 

Y{z) = X (z 2 ). 

Let us call Oo, n the deterministic autocorrelation sequence of the polyphase 
component Xo, n = 2^2n, and, similarly, call ai, n the deterministic autocorrelation 
sequence of the polyphase component Xi, n = £271+1 ■ Their deterministic crosscor- 
relation will be co,i. n - Then, using the same polyphase tools, we can represent the 
deterministic autocorrelation ( 12. 142 j ) via its polyphase components (for simplicity, 
we show it for a real sequence x and in z-transform-domain) : 

A(z) = X{z)X{z- 1 ) 

= (X (z 2 ) + z- 1 X 1 (z 2 ))(X (z- 2 ) + zX^z- 2 )) 

= (A (^)+A 1 (z 2 )) + (C 1 ,o(z 2 ) + z 2 Co,i(z 2 )), (2.211) 

where the first and second terms are the first and second polyphase components of 
the deterministic autocorrelation, respectively, and satisfy: 



A (z 2 ) + A^z 2 ) = A (z- 2 )+A 1 (z- 2 ), 
C h0 (z 2 ) + z 2 C ,i(z 2 ) = z 2 {C lfi {z- 2 ) + z- 2 C^{z- 2 )). 



(2.212a) 
(2.212b) 



We can express the above in matrix form if we assume the x n to be a vector sequence 
with its polyphase components as elements. Then, given x n , 



\Xq X\\ 



(2.213) 
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using (12.231), its deterministic autocorrelation is a matrix given bvr 9 




A n = 


O-O.n Co,l,n 
. c 0,l,-n a l,n_ 


— A* 


(2.214a) 


A(e ju ) = 


' A (e^) Co.i(e^) 


= A*(e> u ), 


(2.214b) 


A(z) = 


A (z) C 0tl (z) 
C 0M {z~ x ) A 1 {z)_ 


= A^z- 1 ). 


(2.214c) 



Polyphase Representation of Filtering The next element we must examine is a 
filter. We decompose a filter exactly as we do a sequence in ( |2.210| ), into its even 
and odd subsequences as with 



90 
9i 



9-2 


9o 


92 


.94 


9-1 


9i 


93 


95 



leading to 



,9n. 



92,. 



ZT 



Go(z) = X) 



g2nZ 



9i, n = 92n+i < — ► Gi(z) = ^2g 2n +iz ", 

nGZ 

G(z) = G (2 2 ) + z" 1 Gi(z 2 ). 



(2.215a) 

(2.215b) 
(2.215c) 



This decomposition can be described via the polyphase representation of filtering 
(convolution), mapping the z-transform of a filter, G(z), into its polyphase repre- 
sentation G p (z): 

'Go(z) z-^G x {z) 

Gi{z) G (z)\ ' 



G p {z) 



(2.216) 



depicted in Figure [2.241 The matrix G p (z) is pseudocirculant (see Appendix |2.B. 2j ) 
We can easily check that ( 12.216) is true since, according to Figure 12.241 

Y(z) = [1 z" 1 ] G p (z 2 ) 



1 z- 



X (z 2 ) 
Xi{z 2 )_ 

G (z 2 )X a (z 2 ) + z- a G 1 (z 2 )X 1 (z 2 )' 

G^Xoiz 2 ) + Goiz^X^z 2 ) 

= Go(z 2 )X (z 2 ) + «- 2 Gi(« 2 )Xi(^ 2 ) + z' l (Gi(z 2 )X (z 2 ) + G Q (z 2 )X 1 (z 2 ) 
= [Goiz^ + z^diz^lXoiz^ + z^X^z 2 )} = G(z)X{z). 

We have seen in ( 12.2021 ) that the adjoint of the filter operator G is G = G* , 
when g n = g*_ n . Using ( 12.2161 ), we find the polyphase representation of that adjoint 



59 See Footnote [52] on Page [236] 
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Figure 2.24: (a) Filtering and (b) its polyphase representation. 



as (assuming real filter coefficients): 



Gliz- 1 ) 



as expected. 



Go^- 1 ) G^z' 1 ) 
zG 1 {z- 1 ) Goiz- 1 ) 



G (z) z^Gyiz) 
Gx{z) G (z) 



G p (z), (2.217) 



Example 2.29 (Polyphase representation of filtering) TakeG(,z) = 1+ 
Z . Then, its polyphase representation (2.2161 ) is 



Its adjoint is 



G T p (z-') 



G P (z) 



1 1 

z 1 



1 z~- 
1 1 



G (z) z^G^z) 
Gx(z) Go(z) 



(2.218) 



(2.219) 



yielding G(z) = 1 + 2. 



Polyphase Representation of Upsampling Followed by Filtering Instead of start- 
ing with the polyphase representation of filtering followed by downsampling, we first 
cover upsampling followed by filtering as it follows closely that of the polyphase 
representation of sequences we have just seen. Thus, upsampling by 2 followed by 
filtering with G(z) leads to an output Y(z), 

Y(z) = G(z)X(z 2 ) = (Goiz^ + z-'G^z^Xiz 2 ) = Y (z 2 ) + z^Y^z 2 ), 

where the polyphase components of Y(z) are Yq(z) = Gq(z)X(z) and Y\{z) = 
Gi(z)X(z), as shown in Figure [2.25( a). 
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Polyphase Representation of Filtering Followed by Downsampling Given a poly- 
phase representation of the input sequence x ( I2.210J ) , we need to find the polyphase 
representation of the filter g so that the output after downsampling is correct. To 
do this, we call go and g~\ the even and odd subsequences (polyphase components) 
of the filter. Then, observe that the convolution g * x can be written as: 

(g*x) n = '^Xklju-k = y^ X2k9n-2k + ^J x 2k+l9n-2k-l 

feez fcez fcez 

= (g*xo)n + (g*xi) n . 

Depending on n, the result of the above convolution will be: 



n - 


= 2k 


g * x - 


= ^0*^0+31*^1, 


n - 


= 2k +1 


g * x - 


= 51*^0+30*^1- 



However, since this convolution is followed by downsampling, the result of the con- 
volution for n odd will disappear, leaving only 

5o*zo+3i*^i 4^ Gq{z)Xq{z) + G x {z)X 1 {z). (2.220) 

We can use the above now to find the expression for the polyphase representation of 
a filter. Using ( 12.196a) to express the output of filtering followed by downsampling, 
the right side of ( 12.220) must equal: 

G (z)X (z) + G 1 (z)X 1 (z) = i[G(^ 1 / 2 )X(z 1 /2 ) + G(-^/2 )x( _ z i/2 )] 

M(G (z) + z^G^z)) (X (z) + z-WXtiz)) + 



2 1 

(a) 

2 1 



G( Z V2) Xiz 1 / 2 ) 

(G (z) - z k / 2 G 1 (z)) (X (z) - z- 1 / 2 X 1 (z))], 
" — v "" v ' 

G(-zi/2) Xi-z 1 / 2 ) 

where in (a) we substituted X(z) in its polyphase form using (2.210) and G(z) in 
a general polyphase form (we put in k as we do not know whether this polyphase 
form will involve a delay or an advance). Rearrange the right-hand side as 

G (z)X (z) + z k ' 2 ' x ' 2 G 1 {z)X 1 {z) + 

\ (z' l ' 2 Go{z)X l (z) + z^ 2 G 1 (z)X (z) - z' 1 ^Go(z)X 1 (z) - z k ' 2 G x {z)X Q {z)) . 

For this expression to equal the right-hand side of (2.220) , the first summand must 
be equal to it and the second must be zero. This can be accomplished only if k = 1 
and the polyphase representation of the filter is 

~ ~ ZT — \ — ^ —■ 

5o,™ = 52™ < — ► G Q (z) = 2_ / g 2n z~ n , (2.221a) 

nGZ 

9l,n = 92n-l ^ Gl(z) = Y.92n-lZ~ n , (2.221b) 

nGZ 

G{z) = G (z 2 ) + ^Gi(z 2 ), (2.221c) 
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(a) Upsampling and filtering from Figure 2.20( a 




(b) Filtering and downsampling from Figure 12.19( a) 



Figure 2.25: Polyphase representation of multirate operations. Note that the definitions 
of polyphase components in (a) and (b) are different; see (12.2151) and ( J2.2211) . 



shown in Figure [2.25( b). Note z instead of z l in ( 12.2101 ) and (2n — 1) instead of 
(In + 1) in ( 12.210) . The reason for (2n - 1) in ( [2. 221b) is because the 0th element 
of the polyphase component ~g\ is the sample at n = — 1, that is, 



go 



9-2 



9o 



92 9i 



9-3 9-1 9i 93 



and the polyphase component Tji must be advanced (shifted to the left) by one 
(multiplication by z) after upsampling for proper reconstruction. 

Note that the duality between (filtering, downsampling) and (upsampling, 
filtering) we have seen earlier, shows through the polyphase decomposition as well, 
(2.221c) and (2.215c) . This duality, including the change from z~ l to z and from 
(2n— 1) to (2n+ 1), is related to the transposition and time reversal seen in (2.202) . 



Polyphase Representation with Rate Changes by N Generalization now follows 
naturally; the polyphase transform of size N decomposes a sequence into N phases 



x N(n-l)+j XNn+j Xjv(n+l)+j 



(2.222a) 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovacevic, and V. K. Goyal 



2.7. Multirate Sequences and Systems 



267 



for j = 0,1, . . . , N — 1, leading to the expressions for a polyphase representation of a 
sequence, filtering followed by downsampling and upsampling followed by filtering: 



^j,n 



XNn+j 



ZT 



9j, 



gNn-j 



ZT 



9j, 



9Nn+j 



ZT 



Xj(z) 


= / x Nn+j z , 
nGZ 
N-1 


(2.222b) 


X(z) 


3=0 


(2.222c) 


Gj(z) 


= 2^ flwm-j ^~ n , 

n£Z 
AT-1 


(2.222d) 


G(z) -- 


= 5>G,(^), 

J=0 


(2.222e) 


Gj(z) 


= /_^9Nn+j Z ", 
nGZ 
iV-1 


(2.222f) 


G(z) -- 


= £ *-'G,(*"). 

i=o 


(2.222g) 



Note the difference between how polyphase components of G are defined compared 
to the polyphase components of G. Those for G are numbered forward modulo N , 
that is, the 0th polyphase component is the one at nN, the 1st is the one at nN + 1, 
the 2nd at nN + 2, and so on (same as for sequences). Those for G, on the other 
hand, are numbered in reverse modulo N , that is, the 0th polyphase component is 
the one at nN, but the 1st is the one at (Nn — 1), the 2nd at (Nn — 2), and so 
on, in reverse order from those for G. As illustration, for TV = 3, we give below the 
polyphase components of the input sequence, filtering followed by downsampling 
and upsampling followed by filtering, respectively: 



Xq 
Xi 

x 2 



X-3 [f0_\ X 3 Xq 
X-2 \X\ | X 4 2 7 

x-i riil X 5 X$ 






5o = 


• ■ • 5-3 
... g_ 4 

• ■ • 5-5 

• • • 5-3 


5o 




53 56 


5i = 


5-1 


52 55 








52 = 


5-2 


5i 54 


5o = 


50 




53 56 


5i = 


• ■ • 5-2 


5i 


54 57 


52 = 


... g_x 


52 




55 58 
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Exercise 12.281 illustrates our discussion on polyphase transform. 

2.8 Discrete Stochastic Processes and Systems 

Many applications of signal processing involve resolving, reducing, or exploiting un- 
certainty. Resolving uncertainty includes identifying which out of a set of sequences 
was transmitted over a noisy channel; reducing uncertainty includes estimating pa- 
rameters from noisy observations; and exploiting uncertainty includes cryptographic 
encoding in which the meanings of symbols are hidden from anyone lacking the key. 
Careful modeling of uncertainty is also exploited in compression when short de- 
scriptions are assigned to the most likely inputs. The issue of uncertainty arises 
in many other fields, where such processes are typically called time series, and the 
operations often include modeling of such time series (for example, modeling noise 
in a fluorescence microscope), estimating its parameters and predicting its future 
behavior (for example, the behavior of the stock market), among others. 

As we have seen in Section 1.C1 one of the tools for modeling uncertainty 
is probability theory, while deriving probabilistic models from the observed data 
is the task of statistics. In what follows, we discuss this stochastic framework 
within discrete-time signal processing, following the same structure of the chapter: 
We start with discrete stochastic processes (random processes, time series). We 
follow this with systems (almost exclusively LSI systems) and associated functions 
on discrete stochastic processes both in time domain (as averages in the form of 
means, variances and correlation functions), as well as in frequency domain (power 
spectral density) . We finish with a brief analysis of multirate systems with stochastic 
behavior. 



2.8.1 Processes 

A discrete stochastic process is an infinite-length sequence whose every element is 
a random variable; in other words, it is a random process (see Section ll.Cj ). For 
example, our temperature example from the introduction could be such a sequence; 
the temperature at noon in front of your house measured every day. If all random 
variables have the same distribution and are independent, the process is called i.i.d. 
(independent and identically distributed). We use the following functions defined on 
a stochastic process™ 

mean /i Xi „ E[x„] 

variance var(x„) E[ (x n — ^ x ,„) 2 ] 

standard deviation er x .„ ^/var(x„) (2.223) 



autocorrelation a-x.k.n E 

crosscorrelation c x ,y,fe, n E 



X fc X fc-r 
X fc yfe-r 



Most of the time we will assume we are dealing with wide-sense stationary (WSS) 
processes, that is, those processes whose mean is constant and autocorrelation de- 



' Although we have seen a version of these in Section [1. CI we repeat them here for completeness. 
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pends only on n: 

Often, this assumption is valid at least for a portion of time of a given sequence; 
for example, many biological processes are stationary over a period of milliseconds, 
while the noise in a communications channel might be stationary over a much longer 
period of time. 

White Noise A very particular discrete stochastic process appearing widely in 
signal processing is the white noised sequence x, whose mean is zero and autocor- 
relation a n is a n = a^S n , or, 

fe, n = 0; var(x n ) = o\\ er x ,„ = a x ; a x ,„ = o 2 J> n . (2.225) 

If the underlying PDF is Gaussian, x is called white Gaussian noiser 2 \ 

The white noise sequence is uncorrelated, but not always independent (in the 
case of the Gaussian PDF, it will automatically be independent as well). Often, 
the term whitening, or, decorrelation is used, meaning that a given sequence is 
made to have zero mean and single-term autocorrelation sequence, and is basically 
a diagonalization process for the covariance matrix. 



2.8.2 Systems 

We now assume that our input x is a WSS sequence, and the system is LSI described 
by its impulse response h, as in Section 2.3.31 What can we say about the output? 
It is given by ( 12.58) , and we can compute the same functions on the output we 
computed on the input (mean, variance, standard deviation and autocorrelation). 
We start with the mean: 



My,™ = E[y„] = E 



7 ^Xfc /i„_/c 

fcez 






^E[x fe ] h n -i 



Mx^frn-fc = ^H{e 3 °) = /i y , (2.226a) 

fcez 



where (a) follows from ( 12.58) ; (b) from the linearity of the expectation; (c) from 
x being WSS; and (d) from the frequency response of the LSI system (assuming it 
exists). We can thus see that the mean of the output is a constant, independent of 
n. The variance is 

var(y„) = a yfi - (^ y ) 2 , (2.226b) 



61 We will shortly see that the DTFT of the autocorrelation of white noise is a constant, mim- 
icking the behavior of the spectrum of the white light; thus the term white noise. 

62 White Gaussian noise is often a mathematically usable model for many real-world processes, 
and is sometimes termed additive white Gaussian noise (AWGN). This is because in many models, 
it is added to the sequence of interest. 
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and the autocorrelation 



(«) 



a y ,k,n = E[y fc y£_„] = E 
0) 



7 / x fc-m^ra / jX-k-n-eht 

mgZ l&L 

(c) 



7 j 7 j ""mAi E | Xfe_ m X k _ n _£ | — 2j 2^ m ' a x,n-(m-«) 

7 , ( /, ^m^m—p) ®x,n—p — 7 y ®h,p Qx,rt— p — ^y,n; 

pGZ 



(2.226c) 



ni^l 



where (a) follows from the definition of convolution ( 12.581 ); (b) from linearity of 
expectation; (c) from the expression for the autocorrelation of the WSS x, f j 2 . 2 2 3 1 ) ; 
(d) from change of variable p = m — I; and (e) from the definition of deterministic 
autocorrelation ( 12.16J ). From this, we see that if the input is WSS, the output is 
WSS as well (as the mean is a constant and the autocorrelation depends only on the 
difference n). We also see that the autocorrelation of the output is the convolution 
of the stochastic autocorrelation of the input and the deterministic autocorrelation 
of the impulse response of the system. 

Similarly, we compute the crosscorrelation between the input and the output: 



(a) 



c x , y ,fe,n = E[x k y* k _ n ] = E 



x k y^ K*i 

feZ 
1 (c) 



E 



E^ 



^ x fc x fc-(n+£) 



n+l. 



(2.226d) 



where (a) follows from (2.58) : (b) from linearity of expectation; and (c) from the 
expression for the autocorrelation of the WSS x, ( 12.223} ). We will use the above 
expressions shortly to make some important observations in the Fourier domain. 

Autoregressive Moving-Average Model Given the input x and an LSI system h, 
an ARM A model is used to analyze and possibly predict future values of the output, 
with only a finite number of parameters. It is given by: 



M 



N 



y^ 6fcX„-fe + J^ a fe y„„ fc , 



(2.227) 



fc=0 



fc=i 



formally the same as the linear, constant-coefficient difference equation ( 12.531 ) (ex- 
cept for the sign in front of the second sum). 

When all the b k , except for bo, are zero, the sequence is called an autoregressive 
(AR) process: 

JV 

y„ = fr x„ + y^a fc y»^ fc . 
fc=i 
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Example 2.30 (First-order AR process) An AR-1 process is described via 

y„ = x n +oy„_i, (2.228) 

that is, an AR process with N = 1, &o = 1 and a\ = a. We normalize it to have 
unit norm, leading to: 



h n = y / l-a 2 a n u n , |o| < 1. (2.229) 

When, instead, all au are zero, the sequence is called a moving average (MA) 
process, as the output is a windowed, local, view of a number of inputs (see also 
Example [231): 

M 

Yn = y^bfcXn-fc. 

k=a 



Example 2.31 (First-order MA process) An M A- 1 process is described via 

Yn = boXn + hXn-l, (2.230) 

that is, an MA process with M = 1. We normalize it to have unit norm, 60 = 
b\ = l/v2, leading to: 

h n = -^(Sn + Sn-i). (2.231) 

Thus, an ARMA process consists of two parts: the AR part and the MA part. 

2.8.3 Discrete-Time Fourier Transform 

As for deterministic sequences, we can use Fourier techniques to gain insight into 
the behavior of discrete stochastic processes and systems. While we cannot take 
a DTFT of a discrete stochastic process, as it is neither absolutely, nor square 
summable, we can make assessments based on averages (moments), such as, taking 
the DTFT of the autocorrelation. Let us assume a WSS x and find the DTFT of 



its autocorrelation ( 12.223) (which we assume to have sufficient decay so as to be 
absolutely summable): 

Me ju ) = J2 a ^ e ~ JUn = E E [ X * X *-J e ~ JUU - ( 2 - 232 ) 

This is called the power spectral density, the counterpart of the energy spectral 
density for deterministic sequences in ( 12.961 ). The power spectral density exists if 
and only if x is WSS, the result of the Wiener-Khinchin theorem. When x is real, 
the power spectral density is nonnegative, and thus admits a spectral factorization 

A x (e* u ) = U(ei")U*(en, 
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where U(e 3U ) is its (nonunique) spectral root. The integral of the power spectral 
density over the frequency range, 

P x = ±- f A x (e? u )du = a^ = E[|x„| 2 ], (2.233) 

is the power, the counterpart of the energy for deterministic sequences in (2.98) . 
The power spectral density measures the distribution of power over the frequency 
range. These four concepts, energy and energy spectral density for deterministic 
sequences, and power and power spectral density for WSS processes, are summarized 
in Table [2131 



Deterministic sequences discrete WSS processes 



Energy spectral density Power spectral density 

A(e"*) = \X(e^)\ 2 A(en = ^E[x fc x£_ n ] e - 

nez 
Energy Power 

E = — A(eJ")dw P = — / A{e^)duj 

2tt J _ T . 2-7T J_ 7I 



jun 



a = ^2\x n \ 2 P = a = E[|x n | 



Table 2.8: Energy concepts for the deterministic sequences and their counterparts, power 
concepts for discrete WSS processes. 

Example 2.32 (First-order AR process, Example 12.301 cont'd) Let us take 
a unit-norm version of an AR-1 process as in ( 12.2291 ). Its power spectral density 

is 

A *^ = n IZf -i»\ = n 1 ~-i*w H < L ( 2 - 234 ) 

(1 — ae- 7 ")(l — ae JU ) |1 — ae JU \ Z 

This function is positive definite, that is, A x (e JUJ ) > 0. 

Example 2.33 (First-order MA process, Example 12.311 cont'd) Let us 
take a unit-norm version of an MA-1 process as in ( 12.230) . Its power spectral 
density is 

A x {e^) = i(l + e^)(l + e-^) = 1 + §(e*" + e~ iu ) = 1 + coslu. (2.235) 

It is positive semidefinite, that is, A x (e JW ) > 0, being for u> = (2k + l)ir. 

Given (2.226c) , the autocorrelation of the output can be expressed as 

A y (e? u ) = A h (enMen = \H(e ju )\ 2 A^ u ), (2.236) 

where Ah(e 3U ) = \H(e : ' u )\ 2 is the DTFT of the deterministic autocorrelation of h, 
according to Table 12.41 The quantity 

P y = E[y*] = ±- ^ Ayie^dw = ^ f |#(e^)| 2 A x (e^) du = fly (0), 

27r J -it 27r J-K 
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is the output power. Similarly to ( 12.236) , and from ( j2.226d.j h we can express the 
crosscorrelation between the input and the output as 

C x . y {e^) = H*(e? u )A x (e> u ). (2.237) 

White Noise Using ( J2.225) and Table 12.41 we see that the power spectral density 
of white noise is a constant: 

A(e juJ ) = o*, (2.238) 

that is, its variance, or, power, is 

1 f" 



— n 



Power Spectral Density Estimation Estimating power spectral density is of im- 
portance, as it describes a stochastic system. This is typically done by estimating 
its local behavior, requiring some form of a local Fourier transform. This local 
estimate is called a periodogram, and its definition and applications are given in 
Section 18.3.21 as they are implemented using filter banks. 

Orthogonality of WSS Processes The concept of orthogonality of two determin- 
istic sequences is defined as their inner product being zero. Just like for random 
vectors (see Appendix II. Cj ) , we need an extension of this concept to handle stochas- 
tic processes. For simplicity, we consider only discrete WSS processes. 



Definition 2.17 


Two discrete WSS processes 


Xn 


and y„ 


are 


called orthogonal 


when 
















Cx,y,n 


= E[x fe y fc _„] = 


0, 


n £ 


Z, 


(2.239a) 




C x , y (en 


= 0, 




uj e 


R. 


(2.239b) 



Orthogonality properties for both deterministic sequences and discrete stochastic 
processes are summarized in Table 12.91 

Deterministic sequence WSS sequence 

Time C x ,y,n = T"! %k Pk-n = ( x k, Vk-n)h — c x , y , n = E[x fe y fe _ n ] = 

fcez 
Frequency C x , y (e j ") = X(e j ")Y* (e^) = C x , y (e ic ") = 



Table 2.9: Orthogonality for deterministic sequences and discrete stochastic processes. 
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Figure 2.26: Wiener filtering as the estimation of the original sequence x (corrupted 
by noise w) by finding a filter with the impulse response h such that the estimate (filtered 
output) x is a minimum MSE estimate of x by minimizing the squared error E[e n ]. 



Example 2.34 (Wiener filtering) Consider a zero-mean WSS sequence x„ 
with autocorrelation a Xi „ and the corresponding DTFT A x (e JU ). Consider now 
a version corrupted by noise as in Figure 12.261 



Yr. 



x„ + w„, 



(2.240a) 



where w„ is additive, zero-mean, WSS noise with autocorrelation a wn , DTFT 
A w (e JU ), and is independent of x„. We want to estimate our original sequence 
x n by finding a filter with the impulse response h n such that the filtered output 



x = /i*y = /i*(x + w), 
is a minimum MSE estimate of x„, by solving 

minE[e^] = minE[ (x„ - x n ) 2 ] . 



(2.240b) 



(2.240c) 



This minimization can be solved by writing e as a function of the impulse response 
of h and setting the derivatives with respect to h to zero (Exercise 12. 7| ). 

Instead, we are going to use a geometric approach. Because x = h * y, 
the best estimate is the orthogonal projection of x onto the subspace S = 
span{y n _fe}fe S z- The orthogonality principle states that the error e is orthog- 
onal to the estimate x. From Definition 12.171 

E[(x n -x„)x;_ m ] = 0. (2.241a) 

Evaluating the two terms separately 



E[x„x* 



E[x„(/i* *„_ m (x* +w*)); 



(a) 



E[x n (/i* * n _ m x*)] , 



where (a) follows from the orthogonality of x and w. Rewriting this in the Fourier 
domain 

C x , X (e^) = H*(e? u )A x (e> u ). (2.241b) 
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5 



M^ni \AAen\ 




(a) Power spectral denisty. 



(b) Wiener niters. 



Figure 2.27: Wiener filtering, (a) Power spectral density A x (e J "') of an AR-1 process 
with a - 0.9 in ( J2.234) and noise levels of A w (e j ") = al e {0.5, 1, 2, 5} (white grid 
lines), (b) Magnitude responses of the corresponding Wiener niters H(e 3u ') for noise levels 
of A w (e> u )=oie{0.5, 1,2, 5}. 



Next, evaluate the second term in ( 12.241a) , 

c%% m = E[x„x*_ m ] = E[h* n (x + w)h* 



(x* +w*); 



Out of the four terms in the above equation, the two involving x and w are both 
zero by orthogonality, while the other two yield (in the Fourier domain), 

Oc,*(e^) = \H(en\ 2 (AAen+A w (en)- (2.241c) 

Substituting (2.241b) and ( 12.241c) into ( 12.241a) leads to 

H*(e j ")A x (e j ") = \H(e^)\ 2 (A x (e^) + A w (e*")), 

and finally, 

A (pi u \ 
H ( e3UJ ) = , / ■ T >, / ■ n - (2.241d) 

As a concrete example, choose x as an AR-1 process from Example 12.32 
with J 4 x (e : ' w ) as in (2.234) . For w, choose white noise, (2.225) , with variance c^, 
or, A w (e juJ ) = a^. Then, the Wiener filter is 

H(en = = ~ 2 ,\~f — r-ia , (2.242) 



1 



■oil 



-i«|2' 



illustrated in Figure 12.271 for a = 0.9 and various noise levels. 
2.8.4 Multirate Sequences and Systems 

An interesting question is what happens when a discrete stochastic process makes 
its way through a multirate system, with any combination of multirate operations 
we have seen so far. We will see that the notion of periodic shift variance for de- 
terministic systems has its counterpart in the notion of wide-sense cyclostationarity 
for stochastic systems. 
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Definition 2.18 A discrete stochastic process x is called wide-sense cyclosta- 
tionary of period N (WSCSat) when the vector of its polyphase components is 
WSS. 



Our temperature example comes in handy here as well; we take the tempera- 
ture sequence x and decompose it into its polyphase components modulo 365. Then, 
every calendar day can follow its own statistical behavior. For example, the temper- 
ature at noon on January 14th in New York City will likely be low, while the same 
measurement taken on July 14th will likely be high. The notion of cyclostationarity 
for LPSV systems is very intuitive given their cyclic nature. We now discuss a few 
basic operations for illustration only; see Further Reading for pointers to literature. 

In what follows, we will see that while downsampling affects the output power 
spectral density, it does not affect the WSS nature of the sequence. On the other 
hand, upsampling does affect the nature of the sequence and causes the output to 
become WSCS instead. 

The mean and autocorrelation of a WSCSjy sequence x satisfy 

Vx,n+N = E[x„ +A r] = E[x„] = fJr^n, (2.243a) 

a*,k+N,n = E[x k+N x* k+N _ n ] = E[x k x* k _ n ] = Ox,fc,n, (2.243b) 

and also 

x is WSCSat =*> x is WSCS fcA r, k e Z. 



Beware that (2.243b) does not imply that a x ,fc,n is periodic as we now illustrate: 

Example 2.35 (Generative model for a WSCS 2 sequence) Given is a sys- 
tem as in Figure [2.281 with x an AWGN sequence, that is, it is 7V(0, a%) (we can 
assume for simplicity it is Af(Q, 1))cj Then, y n are independent random variables 
with the following distributions: 

y 2 « ~ A/"(0,ag), 
y 2 n+i ~ A/'(0,o- 1 ), 

and thus, from ( 12.225) , their autocorrelations are: 

a y ,2k,n = E[y 2fc y2fc- n ] = Oq5, 



u m 
2 



Oy,2fc+l,n = E[y 2 fe+iy2 fc+ i_„] = cr 1 5 n . 

The above autocorrelation is not periodic, as, for example, 

a y ,2k,n+2 = E[y 2 fcy2fc_„_ 2 J = °0 ^n+2 ¥= a y,2k,n- 

The sequence y is, however, WSCS with period 2 (easy to check from ( 12.243) ). 



63 While we will examine the effects of downsamplers and upsamplers in what follows, the example 
is simple enough that we can go through it now. 
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Figure 2.28: Generative model for for a WSCS2 sequence. The input x is WSS, while 
the output y is WSCS 2 (Example ^35) . 



As we have done earlier for deterministic sequences, we can characterize vector 
processes using autocorrelation matrices. For the sake of simplicity, let us consider 
a WSS sequence x and the vector of its polyphase components as in ( 12.2131 ) . Then, 
its matrix autocorrelation will be 



O-0,n 

cio,-, 

E 
E 



C01,'. 



xo,fcX ofc _„ 

Xl > fcX 0,/c-n 



(6) 



x 2 feX 2fe _ 2n j 

&2n a>2n-l 

0>2n+l a 2n 



x ,fcX 1A ,_ n 

x l.* x l,fc-n 

X 2fc x 2fc-2n+l 

x 2t+l x 2fc-2n+lJ. 



(2.244) 



where (a) follows from the definition of the polyphase components of x and (b) from 
x being WSS. In the DTFT domain: 



A(e juJ ) 



A (e j ") e-i u Ai{e iw ) 



(2.245) 



where A(e 3u) ) is the power spectral density of x, and Ao(e JW ) and A\(e 3u) ) are 
the DTFTs of the polyphase components of a n . The matrix A{e 3u) ) is positive 
semidefinite, which we now prove. For simplicity, we assume real entries. First, 
we know that A(e JUJ ) is an even function of u>, ( 12. 97c) , and nonnegative, (2.97b) . 
Furthermore, 



A (e 2 n -- 
A x {e 2 ^) = 

We thus get that 



e 3 ^l[A{e juJ ) - A(e j ^ +w) )) 



A {e~ 2 n 

2jw\ 



-^A^e 



A (e 23 ^) + e-^A l (e 23UJ ) 



Ante 23 ") 



Mx(e 



2ju\ 



A(e 3U ) > 0, 

(6) 

A(e3^ +7T 1) > 0, 



(2.246) 



(2.247) 
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where (a) follows from ( ]2.97bj ) and (b) from ^2.97a[ )-( |2T97b| ) . Then, 

[A {e 2 ^) + e- JW A 1 (e 2 ^))( J 4o(e 2 ^) - e^^A^e 2 ^)) 
= A 2 (e 2juJ )-e- 2juJ A 2 1 {e 2j ^) > 0. (2.248) 

To prove that A(e JU ) is positive semidefinite, we must prove that all of its principal 
minors have nonnegative determinants (see Section [1. B. 2j ) . In this case, that means 
^4o( eJ ") > 0, which we know from ( ]2.97bj ), as well as 

det{A(e ju )) = A 2 {e ju ) - e~ iu A\(e ju ) > 0, 
where (a) follows from ( 12.248) , proving that A(e JUJ ) is positive semidefinite. 



Example 2.36 (First-order AR process, Example 12.321 cont'd) We illus- 
trate this with an example. The proof follows a different path from the one we 
have just seen and is more explicit. As a reminder, we are looking at a WSS 
sequence x and the vector of its polyphase components as in ( 12.2131 ). We take 
the input to be an AR-1 process x with power spectral density as in ( ]2.234[ ), 
which we know is positive definite when \a\ < 1. The matrix A(e 3UJ ) is then 



A(e> 



1-a 4 



(1 -a 2 e-3 u ){\ -aV w ) 



i+. 



&(! + *-*) 



i+ 



F^(l + ^) 



(2.249) 



To check that the matrix is positive definite, we compute y T A(e 3UJ )y, for an 
arbitrary y = [cos 9 sin 0\ : 



[cos 9 sin 9] A(e j 



cos 9 
sin 9 



(1-a-) 



2 (1 + a 2 ) + a(l + cosw)sin20 



1 + a 4 — 2a 2 cos u> 
The denominator of the above expression is always positive since 



(a) 

l+a 4 -2a 2 cosw > 1 + a 4 - 2a 2 



(1-a 2 ) 2 > 



where (a) follows from \a\ < 1. Then, we just have to show that the following 
expression is nonnegative: 

(a) 

(1 + a 2 ) +a(l + cos w) sin 261 > 1 + a 2 + a sin 29 

(b) 

> l + a 2 -2|a| 



(c) 



(i-H) 2 > o, 



where (a) follows from (1 + coslu) > 0; (b) from asin20 > |a|; and (c) from 
|o|<l. 

We could have saved ourselves all this computation had we observed that 



A(e^) = U(e ju ) U T (e- ju ), 



(2.250a) 
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with 



U(e ju ) 



VT^T? 



1 a 

aer^ 1 



1 - cPe-i" 
making it obvious that A(e JU ) is positive semidefinite 



(2.250b) 



Example 2.37 (First-order MA process, Example 12.331 cont'd) The ma- 
trix autocorrelation is 



A{e jul ) 



M 




Ai(eJ 
A {eJ 


u) - 






1 


1 e juJ ~ 


1 


i 


I 


V2 


1 1 


71 


e~ 


JU> 


1 



1 (1 + e> u )/2 

(l+e- ja >)/2 1 

U{e^)U T {e-^), 



and is thus clearly positive semidefinite since it affords a spectral factorization. 

We now go through some basic results involving multirate components with 
WSS or WSCS inputs; a summary is given in Table 2.101 



System 



Input x Output y 



Downsampling by N 



WSS 



WSS 



WSCSa, WSS 

WSCSm WSCSi 

Upsampling by N WSS WSCSjy 

Filtering by h WSS WSS 

WSCSiv WSCSiv 

Filtering followed by downsampling WSS WSS 

Upsampling followed by filtering WSS 



L = M/gcd(M, N) 



WSS 
Rational change (M up N down) WSS 



WSCSiv 
WSS 

WSCSi 



filter has alias-free support 
L = M/gcd(M,N) 



Table 2.10: Summary of results for multirate systems with stochastic inputs. 



Downsampling Given is a downsampler by N as in ( 12.1831 ). Then, if x is WSS, y 
is WSS as well: 



Oy,fe,-. 



E [yfcyfe-«] = E^XAr fc x 



N(k-n) 



W 



&x,Nn — ^y,n; 



(2.251) 



where (a) follows from ( 12.183) , and (b) from x being WSS. 
If x is WSCSat, then y is WSS as well, 

17 r * i W i?r * 1 w 

0.y,k,n = ^[ykYk-ni = fc |_ XNk X N(k-n) \ = a x,Nn = Oy,n, 

where again, (a) follows from ( 12.183) ; and (b) from x being WSCSat. 
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The above two cases are special cases of the more general fact that if x is 
WSCSm, then y is WSCS L , with L = M/gcd(M,N). For this, and other special 
cases, see Further Reading. 

In the DTFT domain, the power spectral density of the output is given by 



A y( e ^) = J2 a y,ne~ 



JUl 



n W 



S fl x. 



Nn e 



-jui 



AT-1 



n (£) ' V" i t„j{w-2-wk)/N\ 



TV 



k=Q 



(2.252) 

where (a) follows from the definition of the power spectral density ( 12.232) ; (b) from 
( 12.251) ; and (c) from the expression for downsampling by TV, ( 12.184) . 



Upsampling Given is an upsampler by TV as in ( 12.188) . Then, if x is WSS, y is 
WSCSjv. This is easily seen if we remember Definition 12.181 y will be WSCSat if 
all of its polyphase components are WSS. All polyphase components of y, except 
for the first one, are zero, and are thus WSS. The first polyphase component is just 
the input sequence x, which is WSS by assumption. 

In the DTFT domain, the power spectral density of the output is given by 



from the expression for upsampling by TV, ( J2.189) . 



(2.253) 



Filtering We need one more element, a filter, to be able to build basic multirate 
systems. The following holds: given a WSCSat input sequence x, and a LPSV sys- 
tem with period TV, the output y will also be WSCSat. The proof is straightforward: 



('■') 



a y ,n+N - E|^y fc y£_ (n+Ar) J — E } j h m x k -m / , h* e x* k _ (n+N) _ e 

L m e 

/ J h>m<l'2^\ X fc~ ™ x fc— Cn+JVI— t I = / _, h m hg a x> n+N—m+t 

m,i 

/ t h m hl E[xfc_ m x* k _ n _ e ] 



m,£ 



(d) 



m,£ 



m+t 



m,i 



E 



7 j hmX-k-m 2_^ ^t x k-n-l 



E[y fc yfe-„] 



where (a) follows from the convolution expression ( 12.58) : (b) from linearity of ex- 
pectation and h being deterministic; (c) from the expression for the autocorrelation 
of x, ( 12323) ); and (d) from x being WSCSat, ( 12. 243b) . For the DTFT-domain 
expression of the autocorrelation of a filtered sequence, see (2.236) . 

Filtering Followed by Downsampling Finally, we put together some simple com- 
binations of the basic operations we have seen so far. Start with filtering followed 
by downsampling as in Figure 12.191 We have already seen that downsampling does 
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not change the nature of the sequence, nor does filtering. Thus, if x is WSS, y is 
WSS as well. In the DTFT domain, using (2336) and ( 123521 ), we get 



A y (e ju ) = —Y,M e3{u " 2 * k)/N )Me KuJ ~ 27Tk}/N ), (2.254) 



fc=0 

where Ag(e^) = |G(e JW )| 2 is the DTFT of the deterministic autocorrelation of g. 

Upsampling Followed by Filtering We now look at upsampling followed by filter- 
ing, as shown in Figure [2.201 Then, if x is WSS, y is WSCSjv. To see that, we use 
what we have shown so far. We know that if x is WSS, the output of the upsampler 
will be WSCSat. As this is the input to an LSI (and consequently LPSV) system, 
we have shown that its output will be WSCSat. We illustrate this with an example: 



Example 2.38 (Upsampling and filtering, Example [2.271 cont'd) Given 
is a AWGN sequence x as the input to the system in Figure [2.201 with g n = 
$n + $n-i as i n Example [2.271 After upsampling, the first polyphase component 
is just x itself, so is WSS, while the second polyphase component is all zero, and 
is thus WSS as well. The output is not WSS since clearly (2.224) is not satisfied. 
The output of the upsampler is thus WSCS2. After filtering, the output is the 
staircase sequence from ( 12.200) . This discrete stochastic process is not WSS, but 
is WSCS2 as each of its two polyphase components are now equal to x. 

In fact, an important result is that a necessary and sufficient condition for the 
output of upsampling and filtering to be WSS for a WSS input is for the filter g 
to have alias-free support. One could think of this as an ideal A^th-band filter, or, 
more generally, a filter that would extract pieces of the repeated contracted input 
spectrum from Figure 2.18( c) so as to reconstruct one full input spectrum. 

In the DTFT domain, using ( 12.236) and (2.253) , we get 

A y (e j ") = A g (e j ")A x {e jN "). (2.255) 

Rational Sampling Rate Change Suppose now we have a combination of upsam- 
pling by M, followed by filtering, followed by downsampling by N. We can say that 
if x is WSS, then y is WSCS L , with L = M/ gcd(M, N). This follows directly from 
the fact we just proved on upsampling followed by filtering and applying the result 
on downsampling. 



2.9 Computational Aspects 

In this section we give an overview of FFT algorithms and their direct application 
to computation of circular convolutions, as well as linear convolutions. We then 
discuss the complexity of some multirate operations. 
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2.9.1 Fast Fourier Transforms 

We have thus far studied the DFT as the natural analysis tool for either periodic 
sequences, or finite-length sequences circularly extended. We now study fast algo- 
rithms for computing the DFT called fast Fourier transform (FFT) algorithms. 

The DFT, defined in (2.159a) , is a sum of N complex terms, each the product 
of a complex constant (a power of Wjsr) and a component of the input sequence 
x (vector- vector product). Thus, each DFT coefficient can be computed with N 
multiplications and (N — 1) additions^ 4 ] There are N such vector-vector products, 
and thus, the full DFT can be computed with [i = N 2 multiplications and v = 
N{N — 1) additions, for a total cost of 

C D FT,direct = N{N-1) + N 2 = 2N 2 - N ~ 0(N 2 ), (2.256) 

exactly the cost of a direct matrix-vector multiplication of an N x N DFT matrix 
F/v by an N x 1 vector X, as in (1.166b). 



Radix-2 FFT The special structure of the DFT allows its computation with far 
fewer operations than 0(N 2 ), based on decomposing it into smaller DFTs and a 
few simple operations to combine the results. For illustration purposes, we consider 
in detail only N = 2 k , k £ Z + , and only briefly comment on fast algorithms for 
other values of N. 

Starting with the definition of the DFT (2.159a) , write 



N-l N-i N-l 

v V^ Tirkn ( a ) V^ ti^OO , V^ TI7 fe(2n+l) 

X k = }_^ x nW™ = 2_^ x 2nW N y + 2^ X 2n +lW N y 

n—0 n—0 n— 

N-l N-l 

= E^2 + ^E^+i^ 2 . (2-257) 

n=0 n=0 

where (a) separates the summation over odd- and even- numbered terms; and (b) 
follows from Wjy = W^/2- Recognize the first sum as the length- N/2 DFT of the 

sequence [xo x 2 ... 2^-2] , and the second sum as the length- N/2 DFT of the 

sequence [xi x$ ... xn-i] ■ It is now apparent that the length-iV DFT compu- 
tation can make use of i*jv /2-t>2£ and F^iiC^x, where D 2 is the downsampling-by-2 
operator defined in (2.180b) . and C 2 is a similar operator, except that it keeps the 
odd-indexed values. Since the length- N/2 DFT is (iV/2)-periodic in k, ( 12.257) , can 
be used both for k € {0, 1, . . . , N/2 - 1} as well as k e {N/2, N/2 + 1, . . . , N - 1}. 
To get a compact matrix representation, we introduce the diagonal matrix: 

A N/2 = diagdl^N^, . . . ,W { N N/2) - 1 }), 



64 We are counting complex multiplications and complex additions. It is customary to not count 
multiplications by (—1) and thus lump together additions and subtractions. 
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and rewrite ( ] 2 . 2 5 T[ ) as 



X 

Xn/2-1 

Xn/2 

Xn-i 



F N/2 D 2 x + A N/2 F N/2 C 2 x, 



N/2 r N/2 { - 



F N/2 D 2 x - A N/2 F N/2 C 2 x, 



N/2-FN/2 



(2.258a) 



(2.258b) 



where the final twist was to realize that Wj^ = —W N 



k-N/2 



leading to 



X 



F N x 



lN/2 
Jn/2 



1JV/2 

ljV/2 



F 



N/2 



'N/2 



D 2 
C 2 



(2.259) 



If this turns out to be a useful factorization, then we can repeat it to represent F N i 2 
using -F)v/4: e tc, until we reach F 2 , which requires no multiplications, fi 2 = 0, and 
only v 2 = 2 additions. Let us count computations in the factored form. 

With fiN and vn the number of multiplications and additions in computing a 
length- N DFT, the factorization ( 12.2591 ) shows that a length- N DFT can be com- 
puted using two length- N/2 DFTs, N/2 multiplications and N additions. Iterating 
on the length- N/2 DFTs, then length- N/2 DFTs, and so on, leads to the following 



recursions: 



Ctjft 



v N 



Mat 



radix— 2 



2v N/2 + N = 2 2 v N/2 2 +2N = 2 3 iy N/2 3 + 2 2 N 

N 

— is 2 + (log 2 N-l)N = Nlog 2 N, 



N 



N 



2MA72 + y = 2 ^N/2 2 + 2— - 



2 hn/2 3 



N ,N N N 

- m + (\og 2 N-l)- = — l 0g2 JV-— , 



Nlos^N 



N 
~2 



0(Nlog 2 N). 



9 N 
2 2 — 



(2.260) 



Thus, recursive application of (2.259) reduces the cost from 0(N 2 ) in (2.256) to 
0(iVlog 2 N) in (2.260) . We illustrate the above procedure with a simple example: 



Example 2.39 (Computation of the length-4 FFT) We check that the fac- 
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torization (12.259ft 


sed equals the length-4 DFT: 
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exactly (|2.161al). ^ 


Ne 


can also write out 


fl2.257): 




3 


3 3 




x k = J2 x 


n wt = Y.^ w " (2n) + H^ + ,w^ n+1) 


n=0 


n=0 n=0 
3 3 






= Y, XZnWt + Wt J2 X * 


,+iW 2 kn 




n=0 n.=0 






= (xoWg + xtWfi + Wfoi 


W°+x 3 W k ) 










~- (*c 


+ (- 


-1) 


'x 2 ) + Wfo 


•l 


h-H 


-)^3 


, 



equivalent to computing one DFT of length 2 on even samples, then one DFT of 
length 2 on odd samples, and finally multiplying those by a constant W\. 

Other FFT Algorithms A more general, and the most famous, class of FFT algo- 
rithms, the Cooley-Tukey FFT, works for any composite length N = NiN 2 - The 
algorithm breaks down a length- N DFT into N 2 length- N\ DFT, N complex factors 
and N± length- ./V2 DFT. Often, either N± or N 2 is a small factor called a radix. 

The Good-Thomas FFT works for N = N\ N 2 where iVj and N 2 are coprime. 
It is based on the Chinese remainder theorem and avoids the complex factors of 
the Cooley-Tukey FFT. Thus, the algorithm breaks down a length- A'" DFT into N 2 
length-A?i DFT and JVi length-^ DFT, equivalent to a two-dimensional length- 
(JVi x N 2 ) DFT. 

The Rader's FFT works for prime length N . It is based on mapping the 
computation of the DFT into a computation of a circular convolution of length 
N — 1 (recall that ( 12.1771 ) shows that the DFT diagonalizes the circulant convolution 
operator). Winograd extended the Rader's FFT to include powers of prime lengths, 
and is sometimes considered as a subclass of the Winograd FFT which we discuss 
next. 

The Winograd FFT is often used for small factors. It is based on considering 
a iVi length-7Vi7V 2 DFT as a two-dimensional DFT of length (JVi x JV 2 ), as we 
have seen for the Good-Thomas algorithm. If Ni and N 2 are prime, we can use 
the Rader's FFT on each 7V"i and N 2 . While it is less costly in terms of required 
additions and multiplications, it is also complex and thus not often used. 
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The split-radix FFT is used for N that are multiples of 4 and recursively 
splits length- N DFT in terms of one length- N/2 DFT and two length- N/4 DFTs 
and boasts the lowest operations count for N = 2 k , k > 1. 

Remember that the cost in terms of additions and multiplications is just one 
measure of how fast an algorithm can be computed; many other factors come into 
play including the specific computing platform. As a result, the Cooley-Tukey 
FFT is still the prevalent one, despite some of the other algorithms having a lower 
multiplication, addition, or a total operations count. Moreover, as we have just seen, 
there exists an FFT for any given N, and each of them is 0(N \og 2 N), carrying 
the cost of 

Cdft,fft = aN\og 2 N ~ 0(N\og 2 N), (2.261) 

where a is a small constant dependent on the actual FFT algorithm. 

2.9.2 Convolution 

In discussing convolution, for the first time we will encounter the issue of computing 
on a finite- length input (offline computation), or, an infinite- length one (online 
computation). In the first case, we will be computing the cost per block of input 
samples, while in the second, we will be computing the cost per input sample. 

Computing Circular Convolution Since the DFT operator diagonalizes circular 
convolution, it is easy to estimate its cost. Using ( 12.177) , computing H requires 
a length- N DFT, N pointwise multiplications by the diagonal matrix entries and 
a length- N inverse DFT. Using ( 12.261) for the cost of the DFT, computing the 
circular convolution carries the cost of 



C CC onv,froq = 2aN\og 2 N + N 
per N input samples. 



0(iV log 2 iV), 



(2.262) 



Computing Linear Convolution 

convolution in time domain (127 



We start with a straight implementation of linear 
8} for a finite-length input x. Without loss of 



generality, assume that the input is of length M and the filter h is of length L < AI . 
We need 1 multiplication and additions to compute yo = hoXo, 2 multiplications 
and 1 additions for j/i = h\Xo + hoxi, all the way to L multiplications and (L — 1) 
additions for yL-i = J2k=o x khL-i-k, Vl, ■ ■ ■ , Vm-i, an d then back down to 1 
multiplication and additions for Hm+l-i = hi,—i%M—ii leading to 



C-lconv^imc — V ~T [l 

2j2 k + ( M ~ L + 1 )( L - l ) 



k=l 



L-1 



2^2,k + {M -L+ l)L 



k=l 



2 (L-2)(L-1) +2 (L-J)L 



2ML -M-L+l 



O(ML), 



(2.263) 
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per M input samples. 

In ( 12.262) , we saw a very efficient implementation of the circular convolution 
using FFTs. We will now show how to reduce the problem of computing the linear 
convolution of an infinite- length input to that of computing the circular convolution, 
and use the results we just developed. To that end, we build upon Example 12.131 
on the equivalence of circular and linear convolutions. 



Example 2.40 (Equivalence of circular and linear convolutions, Example 12.131 cont'd) 
In Example 2.131 we wanted to compute the linear convolution of a length-3 fil- 
ter with a length-4 sequence, and found that computing a circular convolution 
instead was equivalent. We rewrite ( ]2.74b[ ) as follows: 
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where the input vector is now stated explicitly without the trailing zeros, Hq is 
the 6x6 circulant matrix as in ( 12.73) . I4 is a 4 x 4 identity matrix, and 02x4 is 
a 2 x 4 all zero matrix. 

Imagine now that instead of a finite- length input sequence x, we have an 
infinite one. A way to compute a convolution of h with x is to break x into pieces 
of length 4, and then repeat the above procedure. The only issue is that because 
x is of infinite length, only outputs yi and 2/3 are correct; for example, yo is not 
correct as it requires £_i and x_2 to compute it, and in the above, we assumed 
that X-i = X-2 = 0. Let us look at two blocks of outputs for two consecutive 
blocks of inputs, [x_4 x_3 X-2 £-1] and \xq xi X2 £3] , and denote 
those outputs that are only partially computed by ^ p > (for the first block) and 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovacevic, and V. K. Goyal 



2.9. Computational Aspects 



287 



(p-l 



(for the second block), 
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where on the left we have the first output block and on the right the second output 
block, and the brackets show to which block the summands belong. What we 
can see is that the partial outputs in two consecutive blocks complement each 
other, giving the correct result. For example, adding y p to j/q results in 
I12X-2 + hix^i + hoXo, the correct result for i/q. In other words, we have to offset 
the second vector down by 4 rows and add them up. We can express this in 
matrix form as: 
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We now see why the computation of the convolution as above would be efficient, 
since multiplication by E is merely the insertion of zeros (extension), multipli- 
cation by H is circular convolution as we have just seen, and multiplication by 
A (addition) requires only 2 additions for each block of 4 input samples. 
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We now generalize what we have seen in this simple example. Given a length-L 
FIR filter and input vector x in blocks of M samples, the above procedure, called 
the overlap-save algorithm, can be viewed as the following factorization: 

Y = AH EX, (2.264) 

where Y is the output vector, H is a block-diagonal matrix with circular convolution 
operator Hn on the diagonal (with N = L-\-M — 1, according to Proposition 12.101 ), 
A is the addition matrix with Ijv on the diagonal offset by M rows, and E is the 
extension matrix with Im on the diagonal offset by N rows. 

We now compute the cost of the algorithm. As we said, E merely inserts zeros 
and thus has no cost; H costs (2a_/Vlog 2 -ZV + N) operations according to ( 12.262) ; 
and, A requires (N — M) additions, for a total cost of 

2a 2 

Clconv.overlap-add = ^^Vl0g 2 ^V+— Af-1 ~ 0(log 2 iV), (2.265) 

per input sample. In the above, we have not made the substitution N = L + M — 1, 
as in practice, N larger than that necessary minimum is often chosen. 

A dual algorithm to the one we just presented is the overlap-save algorithm, 
whose cost is similar with a small advantage of no additions in the final stage (as 
the factorization is practically the transpose of ( 12.264) ). However, its disadvantage 
is that the DFTs are calculated on denser input vectors, and thus might make the 
FFTs more costly. 

2.9.3 Multirate Operations 

The key to improving the computational efficiency in multirate signal processing 
is simple: always operate at the lowest possible sampling frequency (rate). We 
now show this idea in action, bearing in mind that downsampling and upsampling 
have no cost in terms of additions and multiplications (although they might require 
memory access). 

Filtering Followed by Downsampling We start with time-domain computation. 
Assume we directly compute the convolution (h * x) of the length- M input x and 
length-L filter h, and simply discard every other sample. Using ( 12.263) , the cost is 

Ctimo,diroct = 2ML-M-L + 1 ~ O(ML), (2.266) 

per M input samples. However, we know better than to waste computations this 
way. As we have seen before, filtering followed by downsampling by 2 is equivalent to 
computing only the even samples of the convolution as in ( 12.220) . These polyphase 
components, xq and X\, are convolved with the polyphase components of the filter, 
h\ and ho, which are now of half the length (see Figure 2.25) . We thus have 
to compute two convolutions ( 12.266) at half the length plus add the results of 
these convolutions, yielding ML — M — L + 2 operations for the two convolutions 
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Figure 2.29: Cascade of K filters followed by downsamplers. 

plus M/2 + L/2 — 1 additions (since each convolution product is now of length 
M/2 + L/2 - 1, for a total cost of 

M L 
Ctimcpoiyphasc =ML- — -- + 1 ~ O(ML), (2.267a) 

per M input samples, roughly 50% savings, but still O(ML). 

We now investigate what happens if we move to frequency domain. According 
to Proposition 2.101 instead of computing the linear convolution, we can compute 
a circular convolution with a period of at least (M + L — 1) in this case, and then 
discard every other sample as in ( 12.266) . Using a length- (M + L) DFT, according 
to (2365) 

C ft . cq ,dircct = 2a(M + L)log 2 (M + L)+M + 2L (2.267b) 

~ 0((M + L)log 2 (M + L)), 

per M input samples. If, instead, we use ( 12.220) , we will need two convolutions at 
half the length plus add the results of these convolutions, for a total cost of 

C frcq ,poiypha S c = 2a (M + L) log 2 (M + £,) + (-- 2a)M + (- - 2a)L - 1 

~ 0((M + L)\og 2 (M + L)), 

per M input samples, still 0((M + L)\og 2 (M + L)) but with savings dependent 
on the sizes M and L. This discussion generalizes straightforwardly to other down- 
sampling factors larger than 2. 

Example 2.41 (Iteration of filtering followed by downsampling) We 
have a cascade of K filters followed by downsamplers as in Figure d. 291 Using any 
of the expressions we just derived for the cost of one stage C, we can calculate 
the cost for K stages. For the second stage, it is C/2 (because it runs at half the 
rate of the input), for the third stage it is C/4, etc, leading to the total cost of 

C+j + j + ... + ^t - ( 2 ~^i) C < 2C > ( 2 - 268 ) 

where (a) follows from ( 1P1.65-1) , the formula for a finite geometric series. 

Upsampling Followed by Filtering The operation of upsampling by 2 followed by 
filtering in z-transform domain is expressed via (2.199a) . As this operation is dual 
to the operation of filtering followed by downsampling (they are transposes of each 
other, see ( 12.194) and (2.197) ), it comes as no surprise that both systems have the 
same cost. 
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Appendix 

2. A Elements of Analysis 

2.A.1 Complex Numbers 

The imaginary unit j is denned as a number that satisfies j 2 = — 1; since j 2 = — 1 
implies (— j) 2 = — 1, we are simply fixing one of the two solutions to have the name 
jH A complex number z G C is then a number of the form 

z = a + jb, a, be R. (2.269) 

In ( 12.269J ) , a is called the real part while b is called the imaginary part. The complex 
conjugate of z is denoted by z* and is by definition 

z * = a-jb. (2.270) 

Any complex number can be represented in polar form as well: 

z = re je , (2.271) 

where r is called the modulus or magnitude and 9 is the argument or phase. Using 
the Euler's formula, 

e i e = cos 9 + j sin 9, (2.272) 

we can express a complex number further as 

z = re je = r(cos9+jsm9). (2.273) 

It allows us to easily find a power of a complex number as 

(cos (9 +j sin (9)™ = (e 39 ) n = e 3n6 = cos n9 + j sin nO. (2.274) 

Euler's formula highlights that the argument of a complex number is not unique; 
adding any integer multiple of 2n to the argument does not change the number: 

ejfl+fca* = e3 e eJ 2k, = e ie^ )k = e ^ for any jfc G Z, 

since e-' 2 ^ = 1. Two other useful relations that can be derived using Euler's formula 

are: 

P j0 + P -j0 P j0 _ p -je 
cos e = < _ZJl 1 sin = 1 1 . (2.275) 

Z ZJ 

Complex numbers are typically shown in the complex plane. The complex 
plane has a one-to-one correspondence with R 2 , with the real part shown horizon- 
tally and the imaginary part vertically. Conversion from polar form e^ e to standard 
(or rectangular) form a + jb is by 

a = rcos9, b = rsm9. 



65 Mathematicians and physicists typically use i for the imaginary unit, while j is more common 
in engineering. 
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Conversion from standard form to polar form is simple by just looking at the com- 
plex plane, but more complicated to write. One solution is as follows: 



vV + fc 2 , 



arctan(6/a), 


for a > 0; 




arctan(6/a) + 7r, 


for a < 0, 


b> 


arctan(6/a) — 7r, 


for a < 0, 


b< 


tt/2, 


for a = 0, 


6> 


-tt/2, 


for a = 0, 


6< 


undefined, 


for a = 0, 


6 = 



where arctan returns a value in (— 7r/2, 7r/2). 

Just as a reminder, the basic operations on complex numbers are as follows: 



zi + z 2 
z\ - z 2 

Z\Z 2 

£l 

~2 



(oi +a 2 ) + j( & i +^2), 
(oi - a 2 ) + j{bi - b 2 ), 
(01O2-6162) + j(M2 +Oi 62), 



aia 2 + 6162 



+ j" 



.0261 



ai& 2 



Roots of Unity The same way j was defined as the second root of unity, we can 
define the principal Nth. root of unity as 



W N 



-j2n/N 



(2.276) 



It is easy to check that W^, for k e {2, 3, . . . , N}, are also Nth roots of unity, 
meaning (W^) N = 1. If we drew all N roots of unity in the complex plane, we 
would see that they slice up the unit circle by equal angles; the choice of Wn 
as the principal root makes Wjy, W^, 
Figure [2.301 shows an example with N = 8. 

Here are some useful identities involving the roots of unity: 



. , W N consecutive in clockwise order. 



ir 



i. 



w kN+n = W n^ 



N-l 

\T W nk 



fc=0 



'N 



N, 
0, 



with k, n € Z, 

n = IN, l£Z; 
otherwise. 



(2.277a) 
(2.277b) 

(2.277c) 



The last relation is often referred to as orthogonality of the roots of unity. To prove 
it, for any n not an integer multiple of N, use the finite sum formula from (PI. 65-1) : 



N-l 



Y,( w N) k 



w$ n 



fc=0 



1 



W N 



(2.278) 



since the numerator is and the denominator is nonzero. For n = IN, Wjy 1 
yyktN _ -^ anc j t j luS; ^y di rec t substitution into (2.277c[ ), we get N. 
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wl 


~-^8 7 






/ \l- 


= i 


wA ? 






w$^~~ 


wi 





Figure 2.30: Roots of unity for N — 8 (the principal one is highlighted). 



2. A. 2 Difference Equations 

Finding solutions to linear difference equations introduced in Section 2.3.2 involves 
the following steps: 

(i) Homogeneous solution: First, we find a solution to the homogeneous equation, 



A 



vk h) 



^ akVn- 



fe=l 



by setting the input x in ( 12.54) to zero. The solution is of the form 



JV 



yi h) 



X] afeA fe' 



(2.279a) 



(2.279b) 



fe=i 



where Xk are obtained by solving the characteristic equation of the system, 

N 

^2a k X N ~ k = 0, (2.279c) 



k=0 



and we assumed that clq = 1. 



(p) 



(ii) Particular solution: Then, any particular solution to ( 12.54) . y„ , is found 
(independent of y„ ) . This is typically done by assuming that y„ is of the 
same form as x n , possibly scaled. 

(iii) Complete solution: By superposition, the complete solution y n is the sum of 
the homogeneous solution j/«. and the particular solution j/n : 



N 



y { n } +y { n P) = J2 a ^+Vn 



(V) 



(2.279d) 



fe=i 
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We determine the coefficients ctk in y„ by specifying initial conditions for y n 
and then solving the system. 

2. A. 3 Convergence of the Convolution Sum 



Recall from Appendix 1.A.2 that a doubly-infinite sum is said to converge when it 
converges absolutely. Thus, the convolution h * x between sequences h and x is well 
defined when the sum J^fcez x kh n -k converges absolutely for every value of n. 

When h and x are in ^ 2 (Z), convergence is guaranteed by the fact that the 
standard £ 2 inner product is defined on all of H 2 (Z) (see Exercise 1.121 ) . Specifically, 
for each neZ, define the sequence hS n > by 

hfe = h* n _ k for all k G Z. 

Then each h^ n ' is in £ 2 (Z) because time reversal, shifting, and conjugation do not 
change the £ 2 norm. Thus, for every n G Z, the inner product (x, h} n >) = (h * x) n 
is well defined, so the convolution is well defined. 

In signal processing, we are not quite satisfied with restricting attention to 
sequences in € 2 (Z); for example, simple sequences like constants and sinusoids are 
not in £ 2 (Z). To ensure convergence of the convolution sum while loosening the 
constraints on x requires tightening the constraints on h, or vice versa. Holder's 
inequality for sequences ( J1.201J) gives a simple condition: The convolution sum is 
guaranteed to converge absolutely when h G £ P (Z) and x G £ 9 (Z) with p and q in 
[l,oo] satisfying l/p+l/q > 1@ 

We employ the p = 1, q = 00 case often: By restricting an LSI system impulse 
response h to ^ 1 (Z), we can allow the input x to be any sequence in ^°°(Z) (that 
is, merely bounded) while ensuring that the output sequence h * x is well defined. 
Note that the condition h G I (Z) was already used in BIBO stability of the LSI 
system with impulse response h and is a common assumption. 

2.B Elements of Algebra 

2.B.1 Polynomials 

A polynomial is a finite sum of the following form: 

N 

p(t) = J2a n t n . (2.280) 

n=0 

Assuming a at 7^ 0, the degree of the polynomial is N. The set of polynomials with 
coefficients a n from a given ring, themselves form a ringr 7 ! 



6s H61der's inequality is given with p and q as Holder conjugates, 1/p + 1/q = 1, and making p 
or q smaller makes the corresponding sequence space smaller; see ( [1.37] ). 

67 A ring is a set together with two binary operations, addition and multiplication. The addition 
operation operation must be commutative and associative, while the multiplication must be asso- 
ciative, and distributive over addition. There exists an additive identity and each element must 
have an additive inverse. A standard example of a ring is the set of integers, Z. 
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The roots of a polynomial are obtained by equating a polynomial function 
pit), a function obtained by evaluating a polynomial pit) over a given domain of t, 
to zero. The following theorem, formulated by Gauss, is a useful tool in algebra: 



Theorem 2.19 (Fundamental theorem of algebra) Every polynomial 
with complex coefficients of order N possesses exactly N complex roots. 



Thus, the degree of the polynomial is also the number of complex roots of that 
polynomial. For example, pit) = a^t 2 + ait + an. is a quadratic polynomial and 
has two roots. An irreducible quadratic polynomial is a quadratic polynomial with 
no real roots. For example, pit) = t 2 + 2 has no roots in real numbers; rather, its 
roots are complex, ±j\/2. 

This theorem holds only for polynomials in one variable. For a polynomial 
with real coefficients, we can factor any polynomial into a product of linear factors, 
it— b n ), and irreducible quadratic factors with real coefficients, (t +c n t + d n ), while 
for a polynomial with complex coefficients, the factors are all linear, (t — z n ), 

"n j On ; Cn j ^n t lr& 
N a N n^0 2&_1 (* " M iln=o(* 2 + °nt + dn), 

pit) = J2 a nt n = (2.281) 

71=0 a n , z n G C 

aNllnIo(t-z n ). 

Two polynomials p(t) and q(t) are called coprime, written as (p(t),q(t)) = 1, 
when they have no common factors. The Bezout identity states that if p(t) and q(t) 
are coprime, there exist two other polynomials a(t) and bit) such that 

a(t)p(t) + b(t)q(t) = 1, forallt. (2.282) 

Euclid's algorithm is a constructive way of finding a(t) and bit) in ( 12.282J ) . 

Laurent Polynomials A Laurent polynomial is like a polynomial except that neg- 
ative powers are allowed in ( ]2.280[ ) 

N 

pit) = Y^ ««*"■ (2.283) 

n=-M 

This can be written as 

N+M 

p(t) = r M q(t) with q{t) = ^ a «*™> (2.284) 

n=0 



where q(t) is now just an ordinary polynomial. 
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Ratios of Polynomials A rational function r(t) is a ratio of two polynomials 

r(t) = 44 = ^"7°°"*" . (2.285) 

«(*) Ef =0 M" 

In general, we assume that M > N , as otherwise, we could use polynomial division 
to write ( 12.285) as a sum of a polynomial and a ratio of polynomials with the 
numerator now of order smaller or equal to M. 

Assume that p{t) and q(t) are coprime; otherwise, we can cancel common 
factors and proceed. When M = N, by Theorem 12.191 ( 12.2851 ) has N zeros and M 
poles (zeros of the denominator q(t)) in the complex plane. When M < N, there 
are N — M additional zeros at t = 00. When M > N , there are M — N additional 
poles at t = 00, indicating that a rational function has max(M, N) poles and zeros, 
including ones at and 00. 

Discrete Polynomials A polynomial sequence is a sequence whose nth element is 
a finite sum of the following form: 

N 
p n = ^a k n k , neZ. (2.286) 

fe=0 

For example, a constant polynomial sequence is of the form p n = a, a linear poly- 
nomial sequence is of the form p n = ao + ain, and a quadratic polynomial sequence 
is of the form p n = oq + a\n + air? . The z-transform of such a sequence is: 

p(z) = ^ PnZ - n = e(x>^V"- 

neZ nGZ \k=0 / 

When we study wavelets and filter banks, we will be concerned with the moment 
annihilating/preserving properties of such systems. The following fact will then 
be of use: Convolution of the polynomial sequence with a differencing filter d n = 
(8 n — 5 n -i), or, multiplication of P(z) by D(z) = (1 — z" 1 ), reduces the degree of 
the polynomial by 1, as in 

D{z)P{z) = {l-Z~ 1 )Y J PnZ- n = (l-^)E E ffi ^) rn 
n n \fc=0 / 

JV N 

n k=Q n k=Q 

N N 

= E E a k n k z- n - J2 E a ^ n ~ 1 ) kz ~ n 

n k=0 n k=Q 

N 

= ^^a^-(n-l)V 

n k=Q 

/N-l \ 

n Vfc^O / n 
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where r n is a polynomial of degree (N — 1), and (a) follows from (n N — (n — 1) N ) 
being a polynomial of degree (N — 1). The above process can be seen as applying a 
differencing filter with a zero at z = 1. Extending the above argument, we see that 
by repeatedly applying the differencing filter, we will kill the constant term, then 
the linear, then the quadratic, and so on. 

2.B.2 Vectors and Matrices of Polynomials 

Notions of vectors and matrices can be combined with polynomials and rational 
functions, called either vector/matrix of polynomials or polynomial vector/matrix. 
For simplicity, we introduce all concepts on 2 x 1 vectors and 2x2 matrices. 
A vector of polynomials, or, polynomial vector is given by 



v(t) 



r N a t n 

EtoM" 



p(t) 



N 

E 

n=0 



V n f' 



(2.287) 



where v n are 2x1 vectors of scalars. 

Similarly, a matrix of polynomials, or, polynomial matrix is given by 



H(t) 



Z^n=Q u n L 2^n=0 UnL 

r N r fn V W d t r< 



p(t) q{t) 
r(t) s(t) 



N 

E 

n=0 



H n t 



(2.288) 



where H n are 2x2 matrices of scalars. In both of the above expressions, N is the 
maximum degree of any of the entries. 

Rank is more subtle for polynomial matrices than for ordinary ones. For 
example, if A = 3, 

'a + bt 3(a + bt)~ 
c + dt X(c + dt) 



H(t) 



is rank deficient for every value of t. On the other hand, if A 7^ 3, then it is 
rank deficient only if t = —a/b or t = —c/d, leading to the notion of normal rank. 
The normal rank of H(t) is the largest of the orders of the minors that have a 
determinant not identically zero. In the above example, for A = 3, the normal rank 
is 1, while for A 7^ 3, the normal rank is 2. 

A square polynomial matrix of full normal rank has an inverse, computed 
similarly to a scalar matrix as in ( 11.205) . 



H~ 



adj(ff) 
det(ff) ' 



(2.289) 



A polynomial matrix H(t) is called unimodular if | det H(t)\ = 1 for all t. The 
product of two unimodular matrices is unimodular, and the inverse of a unimodular 
matrix is unimodular as well. A polynomial matrix is unimodular if and only if its 
inverse is a polynomial matrix. All these facts can be proven using properties of 
determinants. 
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Example 2.42 (Unimodular polynomial matrix) The determinant of the 
polynomial matrix 

\l + t 2 + t 
t l + t 



H(t) 



is det H (t) = (1 + t)2 — t(2 + 1) = 1; it is thus unimodular. Its inverse is 

H(t) 



l + t -(2 + t) 
-t l + t 



also a unimodular polynomial matrix. 



Vectors and Matrices of Laurent Polynomials Just as polynomials can be ex- 
tended to Laurent polynomials, vector/matrix of polynomials can be extended to 
vector/matrix Laurent polynomials, 



H(t) 



-N 



a n f 



T N c V 



Z_-/n — - 



but" 
d n t n 



N 



7 4 H n t n , 



-N 



and similarly for vectors. The normal rank is defined as for polynomial matrices. 
A polynomial matrix H(t) is called Laurent unimodular if |deti?(i)| = ct k for all 
t G Z, some c G C and k G Z. The inverse of a Laurent polynomial matrix is again 
Laurent polynomial only if it is Laurent unimodular, since the adjugate in ( 12.289J ) 
is again a Laurent polynomial matrix, while the determinant is a monomial. 

Example 2.43 (Laurent unimodular polynomial matrix) The determinant 
of the Laurent polynomial matrix 



Hit) = i- 

y ' At 



l + 3t 
3 + t 



1 — 3* 

3-t 



is * ; it is thus unimodular. Its inverse is 



Hit)- 1 = - 
also a Laurent unimodular polynomial matrix 



3-i -(l-3t) 
(3 + 1) 1 + 3* 



Vectors and Matrices of Ratios of Polynomials A rational matrix, or, matrix of 
rational functions has entries that are ratios of polynomials, 



Hit) 



Paa(t) poi(t) 

<?oo it) ijoi(t) 

Pio(*) Pii(t) 

910 (*) 911 (*) 



where Pij(t), qij(t) are polynomials in t. The normal rank is defined as for poly- 
nomial matrices. The inverse of a rational matrix is again a rational matrix. In 
Chapter \7\ we will see a connection between polynomial matrices and FIR discrete- 
time filters; and a connection between rational matrices and IIR filters. 
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Adjoint of a Polynomial Vector or Matrix We now discuss the adjoint of a vector 
or matrix of polynomials; extensions to matrices of Laurent polynomials and ratio- 
nal functions follow similarly. When defining the adjoint of a vector or matrix of 
polynomials, we must ensure they are positive semidefinite on the unit circle, that 
is, for |i| = 1H This is because we are extending the idea of autocorrelation ( 12.142} ) 
to vectors and matrices. 

For simplicity, we consider polynomials with real coefficients first. The adjoint 



of a 2 x 1 vector of polynomials ( 12.28TD is 



v(ty = v(t-y = [p^ 1 ) qit- 1 )] 



(2.290) 



The product 



V(t)*v(t) = [p^) qit- 1 )} 



P(t) 
lit) 



p(t~ l )p(t) + q{t- X )q{t) 



is positive semidefinite on the unit circle |i| = 1 since it is the sum of two positive 
semidefinite functions on the unit circle. The same hold for matrices of polynomials: 
the adjoint of a 2 x 2 matrix of polynomials ( 12.288) is 



H(t)* = H(t 



-1\T 



Pit' 1 ) 

.lit" 1 ) 



r{t- 1 ) 
sit- 1 ) 



(2.291) 



the product H(t)* H(t) is a Laurent matrix of polynomials, and positive semidefinite 
on the unit circle. 

An extension of the spectral factorization Corollary 12.141 states : 



Theorem 2.20 
semidefinite on 


Let Ait) be a Laurent matrix of polynomials, 
the unit circle, that is, A(e JW ) > 0, if and only 


Then, 
if 


it 


is positive 




Ait) = H{t)H" 


((f)- 


*), 






(2.292) 


where Hit) is a 


matrix of polynomials. 













Such a matrix is given in Example ( 12.36) and its factorization in ( I2.250J ). 

When polynomial coefficients are complex, the adjoint of the matrix in §2, 
is defined as 



H{ty 



H^t" 1 ) 



pS- 1 ) 
Mr 1 ) 



r.tft- 1 ) 
stiir 1 ) 



N 



E ^-^ 



(2.293) 



n=0 



where subscript * means Hermitian transpose of the coefficient matrices but without 
conjugating i, as is shown on the right-hand side of the equation. 



68 Our use of polynomial matrices is typically with the z-transform, that is, H(t) = H(z 1 ), and 
thus, the polynomial turns into the DTFT on the unit circle. 
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Paraunitary Matrices The extension of unitary matrices ( ] 1 . 2 1 8 [ ) to matrices of 
polynomials or rational functions are paraunitary matrices. A square matrix of 
polynomials or rational functions U(t) is called paraunitary when it satisfies 



U*(t)U(t) = u^t-^uit) 



(2.294a) 



Its inverse U (t) equals its adjoint (Hermitian transpose) U*(t) = U*{t ). A real 
paraunitary matrix satisfies 

U T {r l )U{t) = I. (2.294b) 

A paraunitary matrix is unitary on the unit circle, that is, 

U*(e j ")U(e ju ) = I. (2.294c) 

When all matrix entries are polynomials, the matrix is called lossless. 

Example 2.44 (Paraunitary matrix) The following matrix 



1 
t 



is paraunitary since (2.294b) is satisfied. 



1 


V3 




1 + V3t 


-y/3+t 


r > 


2 


1 


2 


2 


,/.". 


1 


V2 


1— V3* 


-VS-t 


2 


2 




2 


2 



Pseudocirculant Polynomial Matrices The extension of circulant matrices ( 11.227) 
to polynomial matrices are pseudocirculant matrices, an example of which is (2.216) . 
Such a matrix has polynomial entries and is circulant with entries above the diagonal 
multiplied by t (thus pseudocirculant), for example, 



H(t) 



2.B.3 Kronecker Product 



h Q {t) th 2 (t) thi(t)' 
hi{t) h (t) th 2 {t) 
h 2 (t) hi(t) h (t) 



(2.295) 



The Kronecker product of two matrices is defined as (we show a 2 x 2 matrix as an 
example) 

~aM bM~ 



a b 
c d 



®M 



cM dM 



(2.296) 



where a,b,c and d are scalars and M is a matrix (neither matrix need be square). 
The Kronecker product has the following useful property with respect to the usual 
matrix product: 

{A(g>B){C<g)D) = {AC)<g>{BD) (2.297) 

where all the matrix products have to be well-defined. See Exercise 12.81 for an 
application of Kronecker products. 
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Chapter at a Glance 

We now summarize the main concepts and results seen in this chapter, some in a tabular 
form. One of the key elements was finding the appropriate Fourier transform for a given 
space of sequences (such as £ 2 (Z) or hnite- length ones circularly extended) . That procedure 
can be summarized as: 

(i) Start with a given time shift <5 n -i; 

(ii) Induce an appropriate convolution operator Tx = Hx = h*x; 
(iii) Find the eigensequences x n of H (e?^ n for infinite-length sequences and e :>( 27V / N > kn 

for finite-length sequences); 
(iv) Identify the frequency response as the eigenvalue corresponding to the above eigense- 

quence (H(e 3Ujn ) for infinite-length sequences, H k for finite-length sequences); 
(v) Find the appropriate Fourier transform by projecting the sequence on the spaces 
spanned by eigensequences identified in (iii) (discrete-time Fourier transform for 
infinite-length sequences, discrete Fourier transform for finite-length sequences). 



Concept 



Notation Infinite-length sequences Finite-length sequences 



Shift 




S n -1 


linear 




Sequence 
vector 




X n 


n G Z 




LSI system 

filter, impulse 
operator 


response 


h„ 


n e Z 




Convolution 




h * x 


A:£Z 


-k 



Eigensequence 
satisfies 
invariant space 

Frequency response 
eigenvalue 

Fourier transform 
spectrum 



h * v\ = \v\ h*v u = H(e?^ ) v^ 



circular 

n 6 {0,1,..., AT- 1} 

n.£{0, 1, ...,N-1} 

N-l 

/ J x khN,(n-k) mod N 

k = 

£ j(2n/N)kn 

h * v k = H k v k 

S k ={ Qe J( 2 -/~)fcn } 



a e c,lu e R 


oeCiez 


A^ = H(e luJ ) 


-^fe = H k 




N-l 


£ hne-'™ 


J2 h n e-^l N ^ kn 


nez 


n = 


DTFT 


DFT 




Af-l 


X(e 3 '") = ^w- J ' B " 


X k = J2 ^e- j(2,r/JV)fe ' 1 


nez 


n = Q 



Table 2.11: Concepts in discrete-time processing. 
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Concept 



Expression 



Sampling factor 2 

Input 

Downsampling by 2 



x n ,X(z),X(e> u ) 

y n = X 2n 
y = D 2 x 

Y(z) = (1/2) [*( 2 1/2 ) + X(-z 1/2 )} 
Y(e jw ) = (1/2) [X(e^/ 2 ) + X{e j ^- 27T ^ 2 )] 



Upsampling by 2 



.'/ 



x n / 2 , n even; 
0, otherwise. 
y = U 2 x 
Y(z) = X(z 2 ) 
Y(e iu ) = X(e^ 2u} ) 

Filtering by h y = D 2 H x 

& downsampling by 2 Y(z) = (1/2) ± \h(z 1/2 )X(z 1/2 ) + H(-z 1/2 )X(-z 1/2 )] 

Y(e ja ) = (1/2) \H(e^' 2 )X(e juJ / 2 ) + H(e i ^- 2 ^l 2 )X(e : '^- 2lT ^ 2 ) 

Upsampling by 2 
& filtering by g 



y = GU 2 x 

Y(z) = G(z)X(z 2 ) 

Y(e j ") = G(e juJ )X(e j2uJ ) 



Sampling factor N 

Input x n ,X(z),X(e j ") 

Downsampling by N y n = x^ n 
y = D N x 



Y(z) = (l/N) J2 *K* 1/JV ) 



Upsampling by N 





k=0 
N-l 




Y(en 

J 


= (l/N) J2 X(e^~ 
| x n / N , n = IN; 


■2irk/N) 


y n = \ 


1 0, otherwise. 




y = U N 


x 




Y(z) = 


X(z N ) 




Y(en 


= X(e jNu> ) 





Table 2.12: Concepts in multirate discrete-time processing. 
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Domain 


Autocorrelation/Crosscorrelation 


Properties 


Sequences 






•En j Vn 




Time 


«n 




Z^feGZ x k x k-n 


a n = a*_ n 




C n 




z2k^Z X k1Jk-n 


Cx,y,n = c y,x,-n 


DTFT 


A(e ju 


') 


\x(en\ 2 


A(e^) = A*(e^) 




C(e j " 


') 


X(e jw )Y*(e ju ') 


C,, w (e*") = C;, x (e*") 


^-transform 


A(z) 




X(z)X4z~ 1 ) 


A(z) = A,(z- 1 ) 




C(z) 




X(z)Y t (z- 1 ) 


C'x,y(Z) = Cy, x *(Z ) 


DFT 


A k 




\x k \ 2 


Ak = A_ k mod N 




c k 




*kY k * 


^k = C yx _ k mod jy 


Real sequences 




%n j Vn 




Time 


a n 




ZjfceZ x k x k — n 


d n = CL— n 




Cn 




ZjfcfZ X k Vk — n 


Cx,y,n — Cy ,x , — n 


DTFT 


A(e ju 


') 


\x(en\ 2 


A(e j ") = A(e~ ju ) 




C(e^ 


') 


X(e ju ')Y(e- ju ') 


C x ,y(e^) = C y ,x(e-^) 


^-transform 


A(z) 




X(z)X(z- 1 ) 


A(z) = A(z- 1 ) 




C(z) 




X(z)Y(z- 1 ) 


Cx,y(z) = C y ,x(z- 1 ) 


DFT 


A k 




\x k \ 2 


Ak — A_ k mod N 




Ck 




Xk Y k 


&k ~ ^y,x, — k mod JV 



Vector of sequences 

Time A n 

DTFT A(e iu ) 

^-transform A(z) 
DFT A k 



•'CO. 



X 1 . 



O-0,n Co,l,n 








An = A*_ n 


Cl,o,n dl,n 






An = Al n 


A (e^) C ,i(e J 


")' 


A(e' w ) = A*(e' u ) 


Ci,o(e^) Ai(e** 


). 


A{e.' u ) = A T (e-^) 


A a {z) C ,i(z) 






A(z) =A„{z- 1 ) 


Ci, (z) A 1 (z)_ 






A(z) =A T (z- 1 ) 


Ao,k Co,l,fc 






A, — 4* 

^k — ^i_fc mo d AT 


Ci,o,fc A 1:k 








"■ — k mod N 



Table 2.13: Summary of concepts related to deterministic autocorrelation and cross- 
correlation of a sequence (upper half) and a vector of sequences (lower half). For vectors 
of sequences, an example for a vector of two sequences is given, and the second entry in 
properties is for real sequences. Note how we overload A(e 3 ^), A(z) and Ak to mean both 
a scalar and a matrix depending on the sequence. 
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Historical Remarks 

Cooley-Tukey FFT The impact signal processing has had in practical terms is perhaps 
due in large part to the advent of the fast Fourier transform algorithms, spurred by the 
paper of Cooley and Tukey in 1965 [32]. It breaks the computation of the discrete Fourier 
transform of length N — N1N2 into a recursive computation of smaller DFTs of lengths N± 
and Na, respectively (see Section [2,9. If , Unbeknownst to them, a similar algorithm was 
published by Gauss some 150 years earlier, in his attempt to track asteroid trajectories. 

MP3, JPEG and MPEG These already household names, are all algorithms for coding 
and compressing various types of data (audio, images and video). The basic ideas in all 
three all stem from transform-based work in signal processing, called subband coding, 
closely-related to wavelets as we will see in later chapters. 

Further Reading 

Books and Textbooks A standard book on discrete-time processing is the one of Op- 

penheim and Schafer [108] , while a more recent one by Prandoni and one of the co-authors 
of this book, Vetterii, is a view of signal processing as needed for communications [115]. 
For statistical signal processing, see the text by Porat [113] . An early account of mul- 
tirate signal processing was given by Crochiere and Rabiner [34] ; a more recent one is 
by Vaidyanathan [158] . Dudgeon and Mersereau cover multidimensional signal processing 
in [18]. Blahut in [12] discusses fast algorithms for discrete-time signal processing, and in 
particular, various classes of FFTs. 

Inverse z-Transform Via Contour Integration The formal inversion process for the 
^-transform is given by contour integration using Cauchy's integral formula when X(z) is 
a rational function of 2. When X(z) is not rational, inversion can be quite difficult. A 
short account of inversion using contour integration is given in [89]; more details can be 
found in [1081 . 



Filter Design Numerous filter design techniques exist. They all try to approximate 
the desired specifications of the system/filter by a realizable discrete-time system/filter. 
For IIR filters, one of the standard methods is to design the discrete-time filters from 
continuous-time ones using bilinear transformation. For FIR filters, windowing is often 
used to approximate the desired response by truncating it with a window, a topic we 
touched upon in Example 2.41 Then, linear phase is often incorporated as a design re- 
quirement. Kaiser window design method is a standard method for FIR filter design 
using windowing. Another commonly used design method is called Parks-McClellan. An 
excellent overview of filter design techniques is given in [108] . 

Algebraic Theory of Signal Processing Algebraic theory of signal processing is a 
recent development whose foundations can be found in [116] . It provides the framework 
for signal processing in algebraic terms, and allows for the development of other models but 
those based on time. For example, by introducing extensions different from the circular 
one we discussed in Section 2.61 in particular the symmetric one, one can show that the 
appropriate transform is the well-known DCT. Moreover, it provides a recipe for building a 
signal model bases on sequence and filter spaces, appropriate convolutions and appropriate 
Fourier transforms. As such, the existence and form of fast algorithms for computing such 
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transforms (among which are the well-known trigonometric ones) follow automatically. 
Some of the observations in this chapter were inspired by this algebraic framework. 

Pseudocirculant Matrices We have seen the importance these matrices play in multi- 
rate systems, in particular in representing convolution in polyphase domain. A thorough 
presentation of such matrices can be found in [161] . 

Stochastic Multirate Systems In Section [2.8.4[ we examined only the most basic of 
operations in stochastic multirate systems. A thorough discussion of various other special 
cases, including different periods for cyclostationarity of the input and the output, can be 
found in [T26] . 



Exercises with Solutions 

2.1. Properties of Periodic Sequences 

Consider the complex exponential sequences of the form 



(i) Show that if a = ujq/2tt is a rational number, a = p/q, with p and q coprime integers, 

then x n is periodic with period q. 
(ii) Show that if a = iajo/27t is irrational, then x n is not periodic. 
(iii) Show that if x and y are two periodic sequences with periods TV and M respectively, 

then x + y is periodic with period lcm(TVf, TV). 

Solution: 

(i) If a. = — = ^ with p, q £ Z, and (p, q) = 1, then ujq = 2?za = — — , and 

./ , * 2np 27vp - 2-wp 

x n+q = e ■ Hl i = e J i e> il ' v = e J i = x n , 

and thus, x n is periodic with period q. 
(ii) Suppose there exists TV S Z, such that x n = :r n +jv. Then 

A/f 
e^" 2M = e Mn + N)2^ a ^ 1 = e jN2n a ^ JV(» = M £ Z ^ « = - £ Q, 

which contradicts the assumption that a is irrational, 
(iii) If x n = x n+ N and y n = y n ±Mi then clearly z n = z n+P , where z n = x n + y n and 
P = lcm(M, TV), since x n = x n+ p and y n = y n+ p. 

Let us look into an example to show that the sequence z n cannot have a smaller 
period. Consider x n = n mod TV and y n = jn mod M. If z n = x n + y n is periodic 
with period P, then 

(n mod TV) + j(n mod M) = (n + P mod TV) + j(n + P mod M) 
(n mod TV) = (n + P mod TV) and (n mod M) = (n + P mod M) 

P mod TV = and P mod M = 

P = lcm(M,TV)i\ 

for some A" £ TV. Hence, the smallest possible period for z n is lcm(M, TV). 

2.2. LSI System Acting on Signals in £ P (Z) 

Prove that if x £ ty(Z) and h £ ^(Z), the result of h * x is in P(Z) as well. 
Solution: [This solution is not correct.] By inclusion property (11.37) , 



1<P => ^(Z) C P(TL) => h £ e p (Z). 
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Then we use the definition of the £ p norm, (| 1.36a) . to show 

iivin = Ew p - E|E x * fc »-*r ™ EEi^i p i^-*i p 

- EEi^H' 1 "-^ - E i**i' E i^-fci p ^ M Ew ^' w ' 

feeZnez fcez nez feez 

where (a) follows from the definition of convolution, ([2.59] ) ; (b) from the triangle inequality 
(Definition ll.cjf iii)]); in (c) we exchanged summations as per (|1.194| ); in (d) we pulled |iEfc| p 
in front of the sum over n; (e) from h £ £ p (1j); and (f) from x £ £ P (Z). 
2.3. Filtering as a Projection 

Given is a filter with impulse response g n , n S Z. For the filters below, check whether they 

are orthogonal projections or not: 

(Hint: Use the frequency response G(e J ' u ) of the filter's impulse response.) 

(i) g n = S n _ k ,k£Z; 
(ii) g n = i<5 n+ i +5 n + |«n-i; 
(iii) 9n = ^75 ainc(7rn/2). 

Consider now the class of real-valued discrete-time filters that perform an orthogonal pro- 
jection. Give a precise characterization of this class (be more specific than just repeating 
the conditions for an operator to be an orthogonal projection). 

Solution: We can represent our system as y = Gx, with the corresponding operator/matrix 
notation as in (12.63) . To check that a filter (operator) is an orthogonal projection, we must 
check that it is idempotent and self-adjoint as in Definition [L27J 

Checking idempotency is easier in the Fourier domain since G 2 = G has as its fre- 
quency response pair G 2 (e J1J ) = G(e- , " J ). Checking self-adjointness is equivalent to checking 
that the operator matrix G is Hermitian. 

(i) We have that G(e^) = e~^ k and G 2 (e^) = e~^ 2k . Thus, G 2 (e^) ^ G(e>"), 

unless k = 0; this filter is not a projection operator, 
(ii) Similarly, we have 





G(en 


= -e Jt 
2 


J + l + -e--, 










GV") 


-(r 


1 ■ \ 2 

Jt " + 1 + -e~ J1 " 1 = 


1 ,2u, iu 3 
4 2 


e~ ju} + 


l -e~^ 
4 




Thus, G 2 (&>' 


*)±G{ 


e JtJ ); this filter is not ; 


i projection operator 


either. 




(iii) 


From Table 12.41 we can rewrite g n as 












9n 


= —= sinc(7rn/2) = 
v2 


i r /2 r- ■ 

— \ V2e 3 " n du), 






yielding 














G(e*" 


>"{ 


V2, \ui\ < tt/2; 
0, otherwise, 


G 2 (e-) = { I 


\u\ < tt/2; 
otherwise. 



Again, G 2 (e J ") ^ G(e-'"); this filter is not a projection operator either. Note that 
for g n = sinc(7m/2) we would have had a projection. 

Trying to satisfy idempotency in the most general case, we see that for G 2 (e J ") to be 
equal to G(e J "), G(eP") can only be 1 or 0: 

r( ,„ _ ( 1, for a) S -Ri U iJ 2 U . .. U R n ; 
^ ' y 0, otherwise, 

where i?i, i?2, ■ • ■ , Rn are disjoint regions in [— 7r, 7r). 
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Self-adjointness is satisfied if g n = <?!„, or, in Fourier domain, 
G(en = £>e-**" = ^sl»e-^ = ^(e"^")* = G*(e* w ). 

n n n 

Since from the requirement for idempotency G(e JU ) has to be real, self-adjointness is 
automatically satisfied. 
2.4. Fibonacci Filter 

Given is a Fibonacci sequence: 

[. . . [T] 1 2 3 5 8 13 . . .] , (E2.4-1) 

obtained via the following recursion: 

Vn = J/n-1 +Vn-2, (E2.4-2) 

for n = 2, 3, . . ., and yo = j/i = 1. 

(i) Create a filter whose impulse response h n is given by the Fibonacci sequence. Is 

this filter FIR or IIR? Is it BIBO stable? 
(ii) Find the transfer function of the filter H(z), draw the block-diagram of the system 

and show its pole-zero plot, 
(iii) Show that h n is a sum of two geometric series: 

hn = aa n +6/3". 
Solution: The Fibonacci filter is one of the oldest known digital filters. 



(i) We may assume that the Fibonacci sequence in ( |E2.4-1| ) is the response of the system 



hn = \. . . |T| 1 2 3 5 



8 13 



This is clearly an IIR filter as its impulse response is not finitely supported. More- 
over, it is clearly not BIBO stable as a bounded input, <5 n , creates an unbounded 
output, y n = h n . 



(ii) To find the transfer function we use the recursion (|E2.4-2| ) to draw the block-diagram 



of the system as in Figure [E2.4-l[ a). From the diagram, it is easy to see that 

1 - z~ 






and thus 

^ ~ 1-z" 1 -z- 2 ~ (l-a2- 1 )(l-/3 2 - 1 )' 
with _ _ 

l + Vh a i-Vs 

a = , p = . 



The constant a is called the golden ratio. Its pole-zero plot is given in Figure [E2.4-l[ b). 

(iii) We use partial fraction expansion to get 

1 



H{ - Z > = 1 1 2 = 1 

1 — z L — z z 1 — az 


T + - s— r, ROC = {z\\z\>a\, 

1 1 — pz~ L 


with 

a 
a = ~i=> 

VE 


'"-^ 


We can then use Table 2.6 to recognize 


the above as the ^-transform of the following 


sequence: 




h n = (aa n +b/3 n )u, 


, = -L^+l-ZT+Vn 


1 


fl + V5 


y n+1 /i-^Y" +1 l 




V5 


[{ ^ 


) { - ) 
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(a) Block diagram. (b) Pole-zero plot. 

Figure E2.4-1: Fibonacci filter. 



2.5. Circulant Matrices 



Given is an iV X W circulant matrix C as in (|1.227| ), 

(i) Give a formula for det(C). 
(ii) Give a simple test for the singularity of C. 

(iii) Prove that the eigenvalues of C are given by the frequency response (|2.176aJ ). the 
right eigenvectors are the columns of F and the left eigenvectors are the rows of 
F*/N. 
(iv) Prove that C _1 is circulant. 

(v) Given two circulant matrices G*i and Ci , show that they commute, C'i G*2 = C'2 C\ , 
and that the result is circulant as well. 

Solution: The solution to this problem is based on the fact that the DFT diagonal- 
izes the circulant convolution operator as in (12.177) 1. Call C k the DFT coefficients of 
Co, ci, • ■ • 1 cjv-i and A = diag([C\.]) fc J~ . The results now follow easily. 

(i) As the determinant of a product is equal to the product of determinants, from 

([2T77] ), 

det(C) = det(FAF- 1 ) = det(F) det(A) de^F -1 ) 

iV-l 

= det(F)det(F- 1 )] det(A) = \~[ C k . 
" * ' k=o 



(ii) From Part (i)| C is nonsingular if and only if none of the C k is zero. 



(iii) This follows directly from ([2.177) 1 . In the spectral decomposition ( |l,210a[ ) , the eigen- 
values of C are the elements of A; we said that A = diag([C( c ]) J ._7 where G\ are 
the DFT coefficients (frequency response) of the first column of C. 
To show the eigenvector properties, write ( [2.177) 1 as 



C v 



CF = FA, 

Vn-i] = [vo vi ... Vn-i] diag([C fe ])^J 1 

= [CqV C1V1 ... Cjv-if jv-i] , 

Cv k = C k v k , k = 0,l, ..., N -1, 



implying that the columns of F are the right eigenvectors of C. 
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The argument follows similarly for left eigenvectors, 



F~ 1 C 



1 

N 



1 



F*C 



N 



C = diagdCfc])^- 1 



N 



N 



-.C k v* k , 



N-l 

0, 1, ..., JV-1 



N C ° V 

■kCivl 



^Cjv-idj; 



implying that the rows of F* /N are the left eigenvectors of C. 
(iv) Write C _1 as 

C'- 1 = (FAF' 1 )- 1 = FA~ 1 F~ 1 . 

Since G _1 satisfies an equation of the same form as (|2.177[ ), it is circulant as well, 
(v) This part follows similarly to the previous, 

CiC 2 = FAi F~ 1 F A 2 F _1 = F(AiA2)-F _1 = FAF' 1 , 

= i 

again satisfying an equation of the same form as (|2.177[ ) ; C\ C2 is thus circulant as 
well. Because Aj, A2 are diagonal matrices, they commute, allowing us to reverse 
the process and show that G\ and C2 commute. 

2.6. Smoothing Operators 
Given is 

g„ = -(l/16)<5 n+1 + (l/4)<5„ -(l/16)<5„_i, 

and the sequence of operations: filtering by g, followed by downsampling by 2, followed by 
upsampling by 2, followed by filtering by g, that is, y = GU2D2GX = Px. 

(i) Is g n orthogonal to its even translates? Why? 
(ii) If the answer to Part |(i)|is yes: If G = G T , is P an orthogonal projection? Why? 

If the answer to Part |(i) lis no: Assume you are given g n such that {g n -2k}k£Z an d 
{<?n-2fcHeZ are a P a ir of biorthogonal bases, where 



'.-In 



^n + 2 + a<5n + i + b6 n + aS n -l + & n -2- 



How is the biorthogonality condition expressed in terms of their associated operators 
G and G? Find constants a and b. Is P an orthogonal projection? Why? 

Solution: 

(i) Take g n and g n —2- They overlap at n = 1, so (gn,g n —2) = !> a nd thus, g n is not 

orthogonal to its even translates, 
(ii) Since the answer to the first question is no, we now assume that we are given g n 

such that {<? n -2fe}fcez an d {§n-2k}k€Z are a P a i r of biorthogonal bases, implying 

that 

(E2.6-la) 

(E2.6-lb) 

To find a and b, we use (|E2.6-la| ) 

fc = 

k = 1 

yielding a = 4 and fe = 6, and giving rise to the second lowpass filter g n 



(5» 


. g n -2k) = <5fc> 




GG = /. 


(IE2.6-la), 




fc = 


1 1, 1 

a+ -b 

16 4 16 


fc= 1 


1 1 

- — a+ - = 0. 
16 4 



1, 
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We now check whether P = GU 2 D 2 G is an orthogonal projection. 
P 2 = (GU 2 D 2 G)(GU 2 D 2 G) = GU 2 I D 2 G = P, 



P T = (GU 2 D 2 G) T = 
it is a projection but not orthogonal. 

2.7. Wiener Filtering 

Consider Example 12.34 and squared error 



G T (U 2 D 2 ) T G T ^ p- 



E[(x n -x„) 2 ], 

with x = h * x + w. Find the minimum of the quadratic error by differentiating with 

respect to each filter coefficient h n and verifying that you get the solution (|2.241d[ ) as in 

the example. 

Solution: TBD. 

Walsh Basis 

The Walsh matrix Wpf of size 2 N X 2 N is given by 



ir. 



® w h 



U n 



W, 



where (g) is the Kronecker product (J2.296J ) ■ 

(i) Give W 2 and W3. 

(ii) Show that Wpf is unitary (within a scale factor you should indicate), 
(iii) Create a block matrix T 



ir„ 



h Wl 



2V 



\W 2 



^n w * 



and show that T is unitary. Sketch the upper left corner of T. 
(iv) The Walsh-Hadamard transform of size N is described via Wn ■ Derive an algorithm 
that uses Aflog 2 N additions for a length- TV transform. 

Solution: 
(i) 



H', 



ir. 







1 


1 


1 


1 










'1 w{ 




1 


-1 


1 


-1 










1 -VFi 




1 


1 


-1 


-1 


' 










1 


-1 


-1 


1 












-1 


1 


1 


1 


1 


1 


1 


1 




1 


-1 


1 


-1 


1 


-1 


1 


-1 




1 


1 


-1 


-1 


1 


1 


-1 


-1 


2 w 2 ] 




1 


-1 


-1 


1 


1 


-1 


-1 


1 


2 -W 2 \ 




1 


1 


1 


1 


-1 


-1 


-1 


-1 




1 


-1 


1 


-1 


-1 


1 


-1 


1 




1 


1 


-1 


-1 


-1 


-1 


1 


1 






.1 


-1 


-1 


1 


-1 


1 


1 


-1 
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(ii) Let /„ denote the identity matrix of dimension n, and also note that Wjv = Wj^. 



W^W = 1 = h, 



WfWi 



1 1 
1 -1 



1 1 
1 -1 



Wi Wi 
Wi -W x 



21, 



Wi Wi 
Wi -Wi 



4/ 4 , 



W N -i 
W N _i 



W N -i 
-W N _i 



W N -i 
W N _i 



W N -i 
W N _i 



W%W N 
Hence, (1/VW)W N is a 2 N X 2^ unitary matrix, for N e 



1 N U 



(iii) 



Hn 



v^ 



H'i 



;W 2 



1 



1 

vl 
1 



1 

1 



1 

1 


1 
1 


? 


? 


? 

2 


1 
2 



Note that T J 



T i T: 



T. 



W?W 



\WlWi 



^w?w 2 



h 



h 



and thus, T is a unitary matrix, 
(iv) A size 2N Walsh-Hadarnard transform WHT 2 n is computed by evaluating a matrix 
product 



WHT N 
WHT N 



WHT N 
-WHT N 



v-(2) 

L A AT J 



It follows that the addition cost i/jv of the WHTn satisfies the recursion formula 
v 2N =2v N +2N. Using the fact that v 2 = 2, we find v 2 n =2 N +(N-l)2 N = N2 N , 
and the required A r log 2 N cost for a length- TV transform follows. 



Exercises 

2.1. Sinusoidal Sequence 

Given is the following sinusoidal sequence: 

7T 

x n = sin— n, ?/ n 



x n (u n — u n _ N ), 



(P2.1-1) 



where u n is the Heaviside sequence, 
(i) For N = 8, 12 and 16, sketch y n . 
(ii) For which of the above three values of N is the following true? Sketch z n 

A-CS 
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2.2. Discrete Laplacian Operator 
Given is the following system: 

Vn — X n — \ ^Xn \Xn-\-\. 

(i) Is this system linear? Shift-invariant? Causal? Memoryless? BIBO stable? 
(ii) Does it have a matrix representation? If yes, write it down. 

(iii) For each of the following inputs, write and sketch the corresponding output of the 
system: 

x o,n = c; xi in = S n ; X2, n = ««, 

where 8 n is the Kronecker delta sequence, and u n is the Heaviside sequence. Explain 
the effect of this system. 

2.3. Linear and Shift- Invariant Difference Equations 
Consider the difference equation (2,54[ ). 

(i) Show, possibly using a simple example, that, if the initial conditions are nonzero, 

the system is not linear and not shift invariant, 
(ii) Show that, if the initial conditions are zero, then (a) the homogeneous solution is 
zero, (b) the system is linear and (c) the system is shift invariant. 

2.4. Geometric Sequences and Their Properties 

Given is a geometric sequence as in (12.6) ) with |a| < 1, 

0, for n < 0; 

7 ct n , for n > 0. 

(i) Show that 7 = \/l — a 2 leads to a unit-norm sequence, that is, ||x||a = 1. 

(ii) Compute the autocorrelation of this unit-norm geometric sequence, 
(iii) Compute the convolution of this unit-norm geometric sequence with itself, 
(iv) Call 2M a different unit-norm geometric sequence. Compute the crosscorrelation 
between x n and y n , as well as their convolution. 

2.5. Deterministic Autocorrelation and Crosscorrelation 

Consider deterministic autocorrelation and crosscorrelation sequences and their DTFTs as 
in (OM) and pjgg) . Show that: 

(i) a n = a*_ n , (|2.17a[ ). and |a n | < ao\ 
(ii) c n is in general not symmetric but rather Hermitian symmetric as in (2,20a) , and 

C(e ju ) = X(e^)Y"(e^); 
(iii) The generalized Parseval's equality (12.104) holds. 

2.6. Modulation Property of the DTFT 

Given are two sequences x and h, both in £ X (Z). Verify that the convolution in frequency 
property (2.94) holds: 

DTFT 1 „ „ 

h n Xn < > H *u> X. 

2tt 

2.7. Thirdband Filter 

Given is the following filter: 

hn = V3 Sin(7r " /3) . 

7T71 

(i) Find the DTFT of h n . What kind of a filter is it? 
(ii) Given x„ = (S n + <5„_i)/2, and y = h* x, sketch |y(e J ")|. 

2.8. Bandlimitedness 

Are signals with all odd samples equal to zero bandlimited? Prove or disprove (by con- 
structing a counter-example) the assertion and draw the spectrum of an example sequence 
demonstrating your point. 

2.9. ROC of z-Transform 

Compute the z-transforms and associated ROCs for the following sequences: 
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(i 


<5„. 


(ii 


K-k- 


(iii 


a n u n . 


(iv 


-a n U- n -l- 


(v 


na n u n . 


(vi 


—na n u- n -i. 


(vii 


cos(ujon) u n . 


(viii 


sin(o;o^) u n . 


(ix 


a n for < n < N, and otherwise 



2.10. Regions of Convergence 

Consider rational ^-transforms X(z) with ROCx arid Y(z) with ROCy ■ Find the ROCs 
of the following: 

A(z) = X(z) + Y(z); 
B(z) = X(z)Y(z). 

2.11. Orthogonality 

Consider a sequence p n with ^-transform P(z). The goal is to find sequences satisfying 
orthogonality constraint with respect to all shifts, 

(p n ,Pn-k) = 5 k ^ P(z)P(z- 1 ) = 1. 

(i) If p n is FIR, that is, P(z) is a polynomial, show that the only possible solution is 

Pn = ±S n -i, with arbitrary I e Z. (P2.11-1) 

(ii) If the sequence p n has a rational ^-transform P(z), show that 

A(z) 



P(z) 



A(z) 



where A(z) is a polynomial of degree (L — 1) and A(z) 
solution. 

2.12. Linear and Circular Convolution as Polynomial Products 
Given two polynomials of degree N — 1, 



(P2.11-2) 
L + 1 A(z~ 1 ), will be a 



A(z) = J2 a " z "' B ( z ) = J2 bnzU ' (P2.12-1) 

n=0 n=0 

show that C(z) = A(z)B(z) mod (z — 1) = J3 n =o c n z " i s equivalent to the circular 
convolution of the sequences [ag a± ... a N _ i ] and [bg b\ ... b N _ i ] , or 

JV-l 
c n = 2_*i a {n-k) mod N b k- (P2.12-2) 

k = 

(Hint: The operation A(z)B(z) mod (z — 1) is the remainder of the division of A(z)B(z) 
by (z N - 1).) 
2.13. Deterministic Autocorrelation 

The deterministic autocorrelation for a real-valued stable sequence x n is defined as in 

( gag) , 

a n = y^/x k x k _ n . 

fcez 

(i) Show that the ^-transform of a n is A(z) = X(z)X(z~ 1 ). Determine the region of 
convergence for A(z). 

(ii) If x n = ct n u n , with u n the Heaviside sequence (2.10) ), show the pole-zero plot for 
A(z), including the region of convergence. Find a n by evaluating the inverse z- 
transform of A(z). 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



Exercises 313 

(iii) Specify another sequence, y n , that is not equal to x n from Part (ii)| but has the 
same deterministic autocorrelation sequence as that of x n .. 

(iv) Specify a third sequence, v n , that is not equal to either x n or y n but has the same 
deterministic autocorrelation sequence as that of x n . 

2.14. Allpass Filters 

Consider allpass niters where 



- , rr Z — a' 

Hz = Iii ITT 

" 1 — aiZ x 



(i) Assume the filter has real coefficients. Show pole-zero locations, and that numerator 
and denominator polynomials are mirrors of each other, that is D(z) = z~ N 7V(z -1 ). 

(ii) Given h n , the causal, real-coefficient impulse response of a stable allpass filter, give 
its deterministic autocorrelation aj. = ~^2 n h n h n _k- Show that the set {h n _k},k € 
Z, is an orthonormal basis for I 2 (1). 

(iii) Show that the set {h n _ 2 k} is an orthonormal set but not a basis for £ 2 (Z). 

2.15. Allpass System 

Given is an allpass system, 

M 1 



H (*) = c n 7 — % <** e 

, . 1 — Cti-Z x 



(i) Find its magnitude on the unit circle, that is, |_H"(e-""')|. Specify the value of C for 

that magnitude to equal 1. 
(ii) Show that this filter will preserve the norm of any sequence filtered by it. 

2.16. Block Circulant Matrices 

A block-circulant matrix of size NM X NM is like a circulant matrix of size N X N, except 
that the elements are now blocks of size M X M . For example, given two M X M matrices 
A and B, 

_ r a b 

° - [ B A 

is a size 2M X 1M block-circulant matrix. Show that block-circulant matrices are block- 
diagonalized by block Fourier transforms of size NM X NM defined as 



where Fjv is an N X N Fourier matrix, Im is an M x M identity matrix and ® is the 
Kronecker product ( [2,296[ ), 

2.17. Pattern Recognition 

In pattern recognition, it is sometimes useful to expand a signal using the desired pattern 
(template) and its shifts, as basis functions. For simplicity, consider a signal of length N, 
x n ,n = 0, . . . , N — 1, and a pattern p n ,n = 0, ... ,7V — 1. Then, choose as basis functions 

fkn = P(n-k) mod N, k = 0, . . . , N - 1, 

that is, circular shifts of p n . 

(i) Derive a simple condition on p n so that any x n can be written as a linear combination 

of Wk}- 
(ii) Assuming the previous condition is met, give the coefficients qj. of the expansion 

JV-l 

Xn = 2J a k Vk,n- 

k = 

2.18. Computing Linear Convolution with the DFT 

Prove that the linear convolution of two sequences of length M and L can be computed 
using DFTs of size N > M + L — 1, and show how to do it. 
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2.19. DFT Properties 

Find the DFT pairs for the following: 

(i) Time-reversed sequence x_ n moc j jv; 
(ii) Real symmetric sequence x n = x_ n m0( j N and real antisymmetric sequence x n = 

**'— n mod N ) 

(iii) Convolution in frequency property (|2.168[ ), 

2.20. Downsampling by N 

Prove the z-transform and the DTFT transform pairs for downsampling by N given by 
p. 183} and (12.184} , respectively. 

2.21. Downsampling 

Given is a length- ./V sequence x n . Let N = mnM, where mo and M both are positive 
integers, and y n be a sequence obtained by downsampling x n by M, that is, 

y n = x Mn , n = 0, 1, . . . , m a - 1. 

Let Y k be the length- (N/ M) DFT of the sequence y n , and X k be the length- JV DFT of 
the sequence x n . Prove the following: 

(i) For M = 2, the DFT of the downsampled sequence y n is 

Y k = ~ {X k + X k+N/2 ) k = 0, 1, . . . , j - 1. 

(ii) For arbitrary M, the DFT of the downsampled sequence y n is 

M-l 



Yk = Jj Y. x k+iN/M, k = 0, 1, .. ., — - 1. 



2.22. Multirate System with Different Sampling Rates 
Consider the system 

y = HD4GD2FD3X. 

(i) Draw a block diagram. 

(ii) Derive an equivalent system consisting of a single filter block and a single downsam- 
pling matrix. Write the ^-transform of the equivalent system as a function of F(z), 
G{z) and H(z). 

2.23. Interchange of Filtering and Sampling Rate Change 

(i) Prove that downsampling by 2 followed by filtering with G(z) is equivalent to filtering 

with G(z 2 ) followed by downsampling by 2. 
(ii) Prove that filtering with G(z) followed by upsampling by 2 is equivalent to upsam- 
pling by 2 followed by filtering with G(z 2 ). 

2.24. Multirate Identities 

(i) Find the overall transfer function Y(z)/X(z) of the system in Figure P2.24-H 




Figure P2.24-1: Multirate system 1. 

(ii) In the system in Figure [PZ24-2I if H(z) = H (z 2 ) + z~ 1 H 1 (z 2 ), prove that Y(z) 
X(z)H (z). 
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Figure P2.24-2: Multirate system 2. 



(iii) Let H(z),F(z) and G(z) be filters satisfying 

H{z)G{z) + H{-z)G{-z) = 2, (P2.24-la) 

H(z)F{z) + H(-z)F{-z) = 0. (P2.24-lb) 



Prove that for one of the systems in Figure |P2. 24-3 Y(z)/X(z) = 1, while for the 
other Y(z)/X(z) = 0. 



X(z) 


-<& 






~*@-» 


Y(z) 


G(z) 


- H{z) 








Figure P2.24-3: Multirate system 3. 

2.25. Interchange of Multirate Operations and Filtering 
For the system given by the input-output relation 

y = D2AD2AD2AX, 

where A is a matrix representing a filter: 

(i) With the use of identities for the interchange of multirate operations and filtering, 
find the simplest equivalent system, y = D n Hx. Specify the downsampling factor n 
and write H in the ^-transform and Fourier domains, 
(ii) If A is an ideal halfband lowpass filter, draw |i?(e J '")|, clearly specifying the cut-off 

frequencies, 
(iii) If A is an ideal halfband highpass filter, draw \H(e^)\, clearly specifying the cut-off 
frequencies. Is this transfer function capturing the highest frequency content in the 
sequence xl Explain. 

2.26. Coramutativity of Up- and Downsampling 

Prove that downsampling by M and upsampling by N commute if and only if M and N 
are coprime. 

2.27. Combinations of Upsampling and Downsampling 
Using matrix notation, compare: 

(i) U3D2X to D 2 U 3 x; 
(ii) U4D2 x to D2U4 x. 
Explain the outcome of these comparisons. 

2.28. Periodically Shift-Varying Systems 

Show that an LPSV system of period TV, can be implemented with a polyphase transform 
followed by upsampling by N, N filter operations and a summation. 
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2.29. Convolution and Sum of Discrete Random Variables 

A random variable is discrete when it takes values in a countable set. A discrete random 
variable x has a probability mass function (PMF) p x defined by p x (k) = P (x = k). Let x 
and y be independent, integer- valued random variables with PMFs p x and p y . Show that 
z = x + y has PMF p, L = p x * p y . 

2.30. Toeplitz Matrix- Vector Products 

Given a size-(TV X TV) Toeplitz matrix T, and a length-TV vector x, show that the product 
Tx can be computed with 0(TVlog 2 TV) operations. The method consists in extending T 
into a circulant matrix C . What is the minimum size of C, and how does it change if T is 
symmetric? 



a3.0 [October 2011] CC by-nc-nd Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



Chapter 3 

Functions and 
Continuous-Time Systems 



Contents 



3.1 Introduction |318| 

3.2 Functions 319 

3.3 Systems |327| 

3.4 Fourier Transform 13341 

3.5 Fourier Series 355| 

3.6 Continuous Stochastic Processes and Systems . . 363 

Chapter at a Glance 368| 

Historical Remarks 369 

Further Reading |370| 

Exercises with Solutions 370| 

Exercises [372J 

As in Chapter \2\ the key word in the title of this chapter is time; contrasting 
with Chapter \2\ discrete is now replaced with continuous. Time is now uncount- 
able rather than sampled as before. Our vectors are now functions (the domain 
is continuous time), and as we saw in Chapter Q] these form the vector space C R . 
Restricting to the normed vector spaces £ 2 (R) and £°°(R) corresponds to the phys- 
ical phenomena of finite energy and boundedness. Operators that map a function 
to a function are called continuous-time systems. A continuous-time system with 
the shift-invariance property is described by convolution with the system's impulse 
response. An impulse response is a function, and we will see that it is appropriate 
to require that it belong to C (K) or C 2 (R). Once the convolution operation is 
defined, spectral theory allows us to construct the Fourier transform. 

As in Chapter \2\ the above discussion implicitly assumed that the underlying 
domain, time, is infinite. In practice we glimpse a finite portion of time. In Chap- 
ter \2\ we dealt with this issue by assuming that a finite sequence was circularly 
extended leading to the notion of circular convolution and the discrete Fourier 
transform; in this chapter, finitely-supported functions circularly extended (peri- 

317 
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odized) will also have an appropriate circular convolution as well as an appropriate 
Fourier transform, the Fourier series. 

3.1 Introduction 

In most of Chapter |2j we considered sequences; here, we look at functions defined 
for all times t£R. Such a function, 

x(t), fel, (3.1) 

could be the sound pressure sensed by a microphone, or the temperature at a sensor, 
etc; the key is that a value exists at every time. 

In real life, we observe only a finite portion of a function on the real line, 

x(t), te[0,T). (3.2) 

Moreover, computations are always done on finite inputs, requiring a decision on 
what happens at the boundaries. As with sequences, we typically extend functions 
on a finite interval circularly (periodizing the function in the process), 

x(t + T) = x(t), iel. (3.3) 

While finite-length functions in Q3.2J ) and infinite-length periodic ones in (3.3) have 
a fundamentally different character, we will use the same types of tools to analyze 
them. Techniques designed explicitly for finite-length functions are mathematically 
rooted in treating the function as one period of an infinite-length periodic function. 
The consequences of this implicit periodization are central in signal processing. 

As in Chapter \2\ we thus define two broad classes of functions for which to 
develop our tools: 

(i) Functions on the real line are the vector space C K of functions with domain 
K, as defined in ( |1.17c[ ). The support of a function may be a proper subset 
of K; for example, we will often consider functions on the real line that are 
nonzero only at nonnegative time. 

(ii) Functions on a finite interval, without loss of generality, have support in [0, T). 
The tools we will develop do not treat the vector space of functions with 
support in [0, T) generically, but rather as functions defined on a circular 
domain. 

Functions we consider are typically bounded, often smooth, and sometimes peri- 
odic. An important space of functions are those that are bandlimited, that is, 
the maximum frequency present in the function is finite. This plays an important 
role in signal processing, since bandlimited functions can be recovered exactly from 
samples, making a natural connection to discrete-time sequences. 

Given functions (signals, vectors), one can apply operators (systems, filters), 
for example, the voltage produced at a microphone in response to pressure varia- 
tions. These map input functions into output ones, and since they involve continuous- 
time functions, they are usually called continuous-time systems (operators). Many 
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continuous-time systems, such as the microphone described above, are physical sys- 
tems governed by differential equations: for example, the sound waves reaching 
a microphone obey the wave equation. Often, these differential equations have a 
smoothing effect, and functions with singularities (such as a point of discontinuity) 
are smoothed by the time they are observed. In the microphone example, a gunshot 
is first smoothed by the wave equation, and further smoothed by the microphone 
itself. We mention these effects to emphasize the difference between mathematical 
abstractions and observed phenomena. These differential equations are often lin- 
ear, or even linear and shift-invariant; in Chapter [21 the same was true of difference 
equations. 



Chapter Outline 

From this short introduction, the outline of the chapter follows naturally. Sec- 
tion 13.2 discusses continuous-time functions, where we introduce function spaces 
of interest and comment on local and global smoothness. We follow with a short 
overview of continuous-time systems in Section 13.31 particularly LSI systems stem- 
ming from linear constant-coefficient differential equations. This discussion leads 
to the convolution operator and its properties, such as stability. Section 13.41 re- 
views the Fourier transform and its properties. We emphasize the eigenfunction 
property of complex exponentials and give key relations of the Fourier transform, 
together with properties for certain function spaces. We briefly discuss the Laplace 
transform, an extension of the Fourier transform akin to the ^-transform seen in the 
previous chapter, allowing us to deal with larger classes of functions. In Section [3.51 
we discuss the natural orthonormal basis for periodic functions given by the Fourier 
series. We study circular convolution and the eigenfunction property of complex 
exponentials as well as the properties of the Fourier series. Then, we explore the 
duality with the DTFT for sequences seen in Chapter [2J In Section 13.61 we study 
continuous stochastic processes and systems. 



3.2 Functions 

3.2.1 Functions on the Real Line 

The set of functions in ( 13.1) , where x(t) is either real or complex, together with 
vector addition and scalar multiplication, forms a vector space (see Definition 11.1) . 
The inner product between two functions on the real line is defined in ( j 1.20c) , and 
induces the standard C 2 (or Euclidean) norm ( 11. 23c) . Other norms of interest are 
the C 1 norm from ( 1 1.38a) with p = 1, and the 00 norm from ( Il.38b) . We now look 
into a few spaces of interest. 

Function Spaces 

Space of Square- Integrable Functions £ 2 (R) The constraint of a finite square 
norm is necessary for turning the vector space C defined in ( 1 1.1 7c) into the Hilbert 
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space of finite- energy functions £ 2 (M.). As for sequences, this space affords a geo- 
metric view, for example, the orthogonal projection theorem. 



Space of Bounded Functions £°°(R) The space of bounded functions contains 
all functions x(t) such that, for some finite M, \x(t)\ < M for all t € R. This space 
is denoted C°°(M) since it consists of functions with finite C°° norm. 



Space of Absolutely-lntegrable Functions £ 1 (K) The space of absolutely-integrable 
functions consists of those with finite C 1 norm. 

The C p spaces do not satisfy a nesting property as the £ p sequence spaces do 
in ( jl.37| ). Therefore, to avoid technical difficulties when inclusion in different C p 
spaces is needed, we restrict attention to functions in the intersection of the spaces, 
when inclusion in different HP spaces is needed to avoid technical difficulties, we 
restrict to the intersection of the spaces. For example, certain theorems apply only 
to functions that are both absolutely-integrable as well as square-integrable, that 
is, those belonging to C 1 D C 2 . 



Spaces of Smooth Functions To describe the global smoothness of a function, we 
use its continuity and continuity of its derivative (s); these form the C q spaces we 
saw in Section 1.2.41 Even a single point where the gth derivative does not exist or 
is not continuous prevents membership in C q . Thus, global smoothness can fail to 
capture distinctions between important, frequently-encountered types of functions, 
and those that are quite esoteric. For example, the simple function u(t) = 1 for 
t > and u(t) = for t < is infinitely-differentiable at every nonzero t, but 
it fails to be even in C°; in global smoothness, it is no different than a function 
that is discontinuous everywhere. Therefore, to differentiate functions in terms of 
smoothness, we consider local smoothness as well. 

In calculus, it is natural to look at differentiability at various points in the 
domain of the function. In signal processing, on the other hand, it is often preferable 
to define local smoothness using the global smoothness of a windowed version of a 
function. We illustrate this with an example. 

Example 3.1 (Continuous and piecewise linear function) Let {x , x\, 
. . . , xl\ be a finite sequence of real values with xq = xl = as in Figure [3711 (a) . 
Construct a continuous-time function 

, . ( x„ + {t- n)(x n+1 - x n ), for n < t < n + 1, n e {0, 1, . . . , L - 1}; 
X[ > ~ \ 0, for^ [0,L), 

(3.4a) 
a linear interpolation between the integer points as in Figure 13.1( b) such that 
x(n) = x n . This function is in C , C 2 , and C°° , since the sequence {x n } is finite 
and bounded. In terms of smoothness, looking only at a single linear piece, the 
function seems to be in C°°, but since the function is not differentiable at the 
integers, it is only in C°. 
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Figure 3.1: (a) A finite sequence of real values x n — {xo,x±, . . . , xg}. (b) The piecewise- 
linear function x(t) obtained by linearly interpolating x n via ( |3.4aj) . (c) Two different 
windows writ) from ( |3.4b| ) for T = 2 and T — 1/2. (d) Four windowed versions j/t,t for 
T S {1/2, 2} and r e {1/8, 3/8}. All four are in C°, and only y 1/2 , 3 /s is in C 1 . 



We investigate local smoothness using a window that is in C , for example, 
_ f |(l + C0 S (2 ri /T)), IjM^ft 



where T > 0. The window is of size T and centered around the origin. It has one 
continuous derivative, that is, wt € C . Figure [3TTT c) shows ror for two values 
of T. We will use all shifts of wt- 

For any fixed width parameter T and shift parameter r, define the windowed 
version of x(t): 



VT,r(t) = x(t)w T (t-T). 



(3.4c) 



The global smoothness of dt,t varies based on the parameters T and r and gives 
us a local smoothness of x. As a product of continuous functions, 1)t,t is always 
continuous (that is, in C ). When T > 1, the support of the shifted window 
will always include at least one integer point — no matter what t is; thus, It,t 
will not be in G . When T < 1, depending on T and r, some of the windowed 
versions will be in C . For example, if T = 1/2, then for r G [n + 1/4, n + 3/4] 
for an integer n, the windowed version t/t.t is in C . Figure [3711 (d) illustrates 
these conclusions with four windowed versions of x. 
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(c) x 3 (t) = t 2 sin(l/t) 



(d)x 4 (i) = (-l) riog2(1/f)1 



Figure 3.2: Functions illustrating the concept of bounded/unbounded variations. On 
the interval [0, 1], only x 3 {t) is of bounded variation. 



Space of Functions of Bounded Variation Functions of bounded variation are 
easiest to understand when they are also continuous. A continuous function has 
bounded variation if the length of its graph on any finite interval is finite. While 
most of the functions we encounter satisfy this criterion, many do not. For example, 
consider the following functions: 

Xi(t) = sin(l/i), x 2 (t) = t sin(l/i), x 3 (t) = t 2 sin(l/i), 

where each is defined to equal for t = 0. On the interval [0, 1], Xs(t) is of bounded 
variation while X\(t) and X2(t) are not. Another example is a function Xi(t) defined 
on the unit interval [0, 1] and having value ±1 over dyadic intervals: 



X4,(t) 

or, equivalently, 



(-iy 



<t<2' 



iez+, te[o, l], 



X A (t) = (-1) ^2(1/4)1. 

This function is not of bounded variation either. All four functions are shown in 
Figure EL2 

Formally, the total variation of a function x over [a, b] is defined as: 



V^(x) 



sup sup 

N Wi, ■■-,*] 



N-l 

E 

fc=0 



x(tk+i) - x(t k )\, 



where the second supremum is taken over all increasing sequences (to, ti, ■ ■ ., tisr) 
in [a, b]. Then, a real- valued x(t) is said to be of bounded variation over [a, b] when 
V*(x) is finite. 
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Special Functions 

We now introduce the functions most often used in the book. 

Dirac Delta Function The Dirac delta function satisfies 

S(t) = for all nonzero t (3.5a) 

and 

5{t)dt = 1. (3.5b) 



Since no function can actually satisfy these properties, the Dirac delta is not a 
function and is sometimes called a generalized function. It is commonly used in 
signal processing to enable concise expressions that would otherwise require limits 
or integrations. 

The Dirac delta function can be heuristicallyl 69 ! related to a limit of a sequence 
of functions, where one suitable sequence is given by 

d (t) - I n/2 > !'! " 1/n; for n - 1 2 (3 6) 

dn{t) ~ \ 0, otherwise, tor n - I, 2, . . .. (6.b) 

While the sequence of functions does not converge in C 2 norm (see Exercise 13.1) , 
we do have consistency with properties ( 13. 51 ): 

lim d n (t) = for any nonzero t 

n — >oo 

and 



/oo 
d n (t) 
-00 



1. 



Other properties of limits of integrals involving d n are explored in Exercise 13.11 
including an interpretation of derivatives of Dirac delta functions. Table 13.11 lists 
some properties of the Dirac delta function. (The shifting property uses convolution, 
which is defined in ( 13.36) . For the sifting and sampling properties to hold, x(t) must 
be continuous at to and 0, respectively.) 

Another function often used in signal processing, especially to describe sam- 
pling, is the Dirac delta comb or picket fence function. It is a sum of Dirac delta 
functions uniformly spaced at locations nT: 

*r(t) = ^6(t-nT). (3.7) 

n<EZ 



69 A formal derivation of the Dirac generalized function requires the knowledge of the theory of 
distributions. In particular, the Dirac function is defined indirectly through its action on a test 
function, such as the sifting property in Table 3.11 Thus, while ( |3,6| ) is an engineering trick, it is 
most useful in all nonpathological cases. 
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Dirac delta function 

/oo 
5(t) dt = 1 
- oc 

/oo roa 

x(t - t) S(t) dt = x(t) 8 (t -t) dt = x(t ) 

-OO J — OO 

Shifting x(t) * t S(t - t ) = x(t - to) 

Sampling x(t) 8(t) = x(0) 8(t) 

Restriction x(t) 8(t) = l/o} 1 

Table 3.1: Properties of the Dirac delta function. 



Heaviside Function The Heaviside or unit-step function is denned as 

^ 0, otherwise. 

This function is bounded by 1, so it belongs to C°°(M). It belongs to neither £ 1 (IR) 
nor C 2 . The Dirac delta and Heaviside functions are related via 

u(t) = j 5{r)dT. (3.9) 

J — 00 

Pointwise multiplication by the Heaviside function implements the domain re- 
striction operator ( 11.58) for restriction from all real numbers to just the nonnegative 
real numbers: 

1 f x(t), for t > 0; .... ,,. . m 

i« + * = | ^; other -. se = u(t) x{ t), ten 

From this we can also build other domain restriction operators. For example, do- 
main restriction to [trj,£i) is achieved with a difference of two shifted Heaviside 
functions: 

W)* = («(*-M-«(*-*x))»(*) = { ^' oTherwisf 0; ( 3 - 10 ) 



Box Function For any positive real number to, the box function is given by 



70 



X[-to/2,to/2](*) - \ 0, otherwise, (3 ' U) 

that is, it is an indicator function of the interval [—to/2, to/2]. It is of unit square 
norm and its integral f tGR x{t) dt = 1 when to = 1. The box function and the sine 
function are intimately related; they form a Fourier-transform pair, as we will see 
later. 

The box function can be expressed in terms of the Heaviside function as 

X[-t /2,t /2](i) = u{t+\)-u{t-\). (3.12) 



This is often called a centered and normalized box function. 
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Gaussian Function A Gaussian function is defined as 

g (t) = 7e -«C-< (3.13a) 

where fi shifts the center of the function to t = /x, and a and 7 are positive constants. 
When a = 1/(2ct 2 ) and 7 = l/(ay/2n), \\g\\i = 1, and thus g can be seen as 
a probability density function, with \x and a interpreted as the mean and standard 
deviation, respectively: 

\\g(t)\\i = / W)\dt = / 7 e- Q(t ^ )2 rfi 



2 



_J_ e -(t-») 2 /^ 2 ) dt = L (3 . 13b) 

When 7 = (2a/7r) 1 ' 4 , ||<7|| = 1, that is, g is of unit norm (energy): 

/>oo 

|J ' | g (t)| 2 di - / 7 2 e- 2Q (^^ 2 rfi 

'^g-aaCt-M)^ = L (3 . 13c) 

7T 



Deterministic Correlation 

We now discuss two operations on functions, both deterministic, that appear through- 
out the chapter; these are analogous to the notions of deterministic autocorrelation 
and crosscorrelation for sequences defined in Section \2. 21 Stochastic versions of both 
operations will be given in Section 13.6.11 

Deterministic Autocorrelation The deterministic autocorrelation a of a function 
x is 

/OO 
x(t) x* (t - t) dr = (x(t), x{T-t)) T . (3.14) 

-OO 

The deterministic autocorrelation satisfies 



a(t) = a*(-t), (3.15a) 

o(0) = r \x(t)\ 2 dr = |H| 2 , (3.15b) 



analogously to ( 12.17) . The deterministic autocorrelation measures the similarity 
of a function with respect to shifts of itself, and it is Hermitian symmetric as in 
( 13. 15a) . For a real x, 



a(t) = / x(t) x(t - 1) dr = a(-t). (3.15c) 

When we need to specify the function involved, we write a x (t). 
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Deterministic Crosscorrelation The deterministic cross correlation c of two func- 
tions x and y is 



c{t) = I x{T)y*(T-t)dT = {x(T),y(T-t)) T , (3.16) 

J — OO 

and is written as c x ^ y {t) to specify the functions involved. It satisfies 

c x , y {t) = ^J X j{r-t)x*{r)dr^ ( => (^J\(r')x*(r'+t)dA = c* y J-t), 

(3.17a) 
where (a) follows from change of variable r' = r — t. For real x and y, 



c x, y {t) = j x(r)y{r — t)dr = c VtX (-t). (3.17b) 

J —00 

3.2.2 Periodic Functions 

Periodic functions with period T satisfy 

x{t + T) = x(t), tel. (3.18) 

Such functions appear in many physical problems, most notably in the original work 
of Fourier on heat conduction in a circular wire. In general, such functions do not 
have a finite C 1 or C 2 norm; instead, we consider functions such that a period is in 
C 1 or C 2 , respectively: 

T/2 r-T/2 

\x(t)\dt < 00 or / \x{t)\ 2 dt < 00. (3.19) 

-T/2 J -T/2 

As we said earlier, another way to look at these functions is as functions on an 
interval, circularly extended, similarly to what we have seen in Chapter [2] 

3.2.3 Multidimensional Functions 

Multidimensional functions are functions in several variables. A two-dimensional 
example is a function x(t\, £2)- While they can have an arbitrary number of dimen- 
sions, in signal processing, two- and three-dimensional functions are typical, such 
as images (two dimensions), or video (three dimensions). An obvious generalization 
of the one-dimensional case is when these functions are separable. For example, in 
two dimensions, separable functions have the form 

x(h,t 2 ) = x 1 (t 1 )x 2 (t 2 ). (3.20) 

Clearly, one can apply one-dimensional theory to each factor, such as the factor- 
ization of polynomials. While limited, this separable class is quite popular due to 
its simplicity. For nonseparable functions of multiple variables, certain techniques 
used in one dimension do not generalize to multiple dimensions. For example, the 
fundamental theorem of algebra, Theorem 2.191 is stated for polynomials of one 
variable only. The notion of norms extends directly to multidimensional functions, 
as do the smoothness classes. 
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Figure 3.3: A continuous-time system. 

3.3 Systems 

Continuous-time systems are operators having continuous-time functions as their 
inputs and outputs. Among all continuous-time systems, we will concentrate on 
those that are linear and shift-invariant. This subclass is both important in practice 
and amenable to easy analysis. After an introduction to differential equations, which 
are natural descriptions of continuous-time systems, we study linear, shift-invariant 
systems in detail. 

3.3.1 Continuous-Time Systems and Their Properties 

A continuous-time system is an operator T that maps an input function x G V into 
an output function y £ V, 

V = T(x), (3.21) 

as shown in Figure 13.31 As we have seen in the previous section, the function space 
V is typically £ 2 (M.) or £°°(R). At times, the input or the output is in a subspace 
of such spaces. 

Types of Systems 

For each of the types of systems seen in the previous chapter, we have a continuous- 
time counterpart. As the concepts are identical, we just list them for completeness. 
It is instructive to compare the discrete-time ones against the continuous-time ones. 

Linear Systems The definition of linearity of a continuous-time system is similar 
to Definition 1.171 of a linear operator and Definition 2.1 of a linear discrete-time 
system: 



Definition 3.1 (Linear system) A continuous-time 
when, for any inputs x and y and any a, (3 G C, 


system 


T 


is 


called linear 






T(ax + (3y) 


= aT(x)+f3T(y) 








(3.22) 



The function T is thus a linear operator, and we write ( 13.21) as 

y = Tx. (3.23) 
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Memory less Systems The definition of a memory less system closely follows Defi- 
nition 12.21 



Definition 3.2 (Memoryless 


system) A continuous-time 


system 


T 


is 


called 


memoryless 


when, for 


any real r 


and inputs x and x' , 














1 {t} x 


= l{r>a;' 


=* hr}T( X ) = 


Hr}T{ 


x>). 






(3.24) 



Causal Systems The output of a causal system at time t depends on the input 
only up to time t. If two inputs agree up to time k, the corresponding outputs must 
agree up to time k: 




As discussed in Section 2.3.11 causality can seem to be a property that is required 
of any real system. When the time variable literally represents time and the same 
time origin is used for the input and output functions, causality is indeed necessary 
for accurate models of physical systems. 

Shift-Invariant Systems In a shift-invariant system, shifting the input has the 
effect of shifting the output by the same amount: 



Definition 3.4 (Shift-invariant system) A continuous-time system T is 
called shift invariant when, for any real r and input x, 

y = T(x) => y' = T{x'), where x'{t) = x{t - r) and y'(t) = y(t - r). (3.26) 



Stable Systems As for discrete-time systems, we define and consider bounded input 
bounded output (BIBO) stability exclusively: 
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Basic Systems 

We now discuss a few basic continuous-time systems. 

Shift The shift-by-^o operator is defined as: 

y(t) = x(t-t ), (el, (3.28) 

and simply delays x(t) by to- It is an LSI operator, causal and BfBO stable, but 
not memoryless. While this is one of the simplest continuous-time systems, it is 
also the most important, as the whole concept of time processing is based on this 
simple operator. Compare this continuous-time shift operator to the discrete-time 
one defined in ( ]2.38[ ). 

Modulator While the shift we just saw is the shift in time, modulation is shift 
in frequency (as we will see later in this chapter). A modulation by a complex 
exponential of frequency luq is given by 

y (t) = e JbJot x{t), t G E. (3.29) 

This operator is linear, causal, memoryless and BIBO stable, but not shift invariant. 
For those already familiar with Fourier analysis, ( 13.29) shifts the spectrum of x(t) 
to the position ujq in frequency. Compare this continuous-time modulation operator 
to the discrete-time one defined in ( 12.40) . 

Integrator Similarly to the accumulator ( 12.42) in discrete time, an integrator sums 
up the inputs up to the present time: 

y(t) = / x{t)cLt, te R. (3.30) 

J —00 

This is an LSI, causal operator, but not memoryless nor BIBO stable. 



Averaging Operators As in ( 12.46) . for any fixed T > 0, we could consider a system 
that takes an average of the input, 

I ft+T/2 
y(t) = - / X {T)dT, t ft. (3.31) 



T Jt-T/2 

This is a moving average filter since we look at the function through a window of 
length T . This operator is LSI and BIBO stable, but neither memoryless nor causal. 
We could obtain a causal version by simply delaying the moving average in 
(BUI) by T /2, resulting in 

1 '* 



y(t) = - / x(t)g!t, (el (3.32) 

1 Jt~T 

This operator is again LSI and BIBO stable but also causal, while still not memo- 
ryless. 
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Maximum Operator This simple operator computes the maximum value of the 
input up to the current time: 

y(t) = max(l{_ 0Oi ... jt }jc). (3.33) 

This operator is clearly neither linear nor memoryless, but it is causal, shift invari- 
ant, and BIBO stable. 

3.3.2 Differential Equations 

While differential equations are typically encountered before difference equations, 
in our exposition this is not the case. In Chapter \2\ we have examined the basic 
principles behind difference equations. As in discrete time, where linear difference 
equations relate the input sequence and past outputs to the current output, in con- 
tinuous time, linear differential equations relate the input function and past outputs 
to the current output. In particular, linear constant-coefficient differential equations 
(compare to linear constant-coefficient difference equations in ( 12 . 54j ) ) describe LSI 
systems and are of the form: 

fc=0 fc=l 



To find the solution, we follow the procedure outlined in Section 2. A. 21 (1) We 
find a solution, j/' '(t), to the homogeneous equation by setting the input x(t) in 
( 13.34) to zero. (2) We then find any particular solution, y( p >(t), to ( 13.341 ), typically 
by assuming the output of the same form as the input. (3) Finally, the complete 
solution is formed by superposition of the solution to the homogeneous equation and 
the particular solution. The coefficients in the homogeneous solution are found by 
specifying initial conditions for y(t) and then solving the system. A standard way of 
finding solutions to differential equations is to use Fourier and Laplace transforms, 
and thus, we will revisit differential equations once we are equipped with these tools. 

3.3.3 Linear Shift-Invariant Systems 

Impulse Response 

The impulse response of an LSI continuous-time system is defined with the Dirac 
delta function playing the role that the Kronecker delta sequence plays in discrete 
time: 



Definition 3.6 (Impulse response) A function h is called the impulse re- 
sponse of LSI continuous-time system T when input 5 produces output h. 



The impulse response ft, of a causal linear system always satisfies h(t) = for all 
t < 0. This is required because, according to ( 13.25) , the output in response to input 
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S must match on (— 00, 0) to the output function that results from the input 
function. 

An LSI system is completely specified by its impulse response. In discrete 
time, this result is simple because the Kronecker delta sequence and its shifts forms 
a basis; see ( 12. 58ft . We shall not formally prove the continuous-time result here, but 
rather provide an intuitive explanation. 

Assume we can approximate the function x(t) by a linear combination of d n (t) 
from ( 13. 6) and its shifts by integer multiples of 2/n: 

2 
x n {t) w V - x(2k/n) d n (t - 2k/n). (3.35a) 

kez 

Denote the response of the system to input d n (t) by h n (t). Then similarly to Q2.58D , 
the response of the system to x n (t) is expressed by linear combination of h n (t) and 
its shifts by integer multiplies of 2/n: 

2 
y n (t) = V -x(2k/n)h n {t-2kln). (3.35b) 

^-^ n 

fegZ 

Now taking n — > 00 causes d n — > 5 and h n — > h (by definition). The sums in 
( 13.35) become Riemann integrals, so x n — > x by the sifting property in Table 3.11 
and y n — > y where y is determined by the input and the impulse response h, as we 
wished to demonstrate. While the use of the Dirac delta function and presumptions 
of convergence make this merely a heuristic argument, it does capture the main 
ideas. 

Convolution 



To parallel (2.58) , we can express an arbitrary input to LSI system T as 

/•OC 

x(t) = / x(r)6(t - r) dr, 



by using the sifting property in Table 13.11 Then the output resulting from this 
input is 

/oo />oo 

x{t) 8(t - r) dr = / x(t) TS(t - r) dr 
-00 J —00 

x(t) hit — t) dr = h * x, 

where (a) follows from linearity; and (b) from shift invariance and the definition of 
impulse response, defining the convolution: 



Definition 3.7 (Convolution) The convolution between functions h and x is 
defined as 



(Hx)(t) = (h*x)(t) = / x(r)h(t - t) dr, (3.36) 
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where H is called the convolution operator associated with h. 



The convolution equation reduces the problem of solving ordinary differential equa- 
tions to that of finding impulse responses. 



Properties The convolution (3.36) satisfies: 
(i) Connection to the inner product 

/oo 
x(r)h(t - t) dr = (x(r), h*{t-T)) T . (3.37a) 

-00 

(ii) Commutativity 

h* x = x * h. (3.37b) 

(iii) Associativity 

g * (h* x) = g * h* x = (g * h) * x. (3.37c) 

(iv) Deterministic autocorrelation 

a(t) = / x(t) x* (t - t) dr = x{t)* t x*(-t). (3.37d) 



These properties have discrete-time counterparts in ( 12.61) and are explored further 
in Solved Exercise 13.21 Note that all the properties of convolution depend on the 
integrals — whether written explicitly or implicitly — converging. We will not dwell 
on these technicalities; Appendix I2.A.3 J though focused only on discrete time, has 
a related discussion. 

Filters As in Chapter \2\ the impulse response is often called a filter and the 
convolution is called filtering. 

Stability Similarly to Chapter \2\ BIBO stability for continuous-time systems is 
equivalent to the absolute integrability of the impulse response. The proof is similar 
to the discrete-time case (see Proposition 12.8) and is left for Exercise 13.21 



Proposition 3.8 (BIBO stability) An LSI system is BIBO stable if and only 
if its impulse response hit) is absolutely integrable. 



Smoothing One key feature of many convolution operators is their smoothing 
effect. For example, when the impulse response has a nonzero mean (zeroth moment, 
see ( 13.63a) ), the convolution will compute a local average, as we now show: 
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Example 3.2 (Local smoothing by convolution) Choose the box function 
of width to > 0, ( 13.11) , as the impulse response h(t), and a piecewise-constant 
function over integer intervals, 

x(t) = x n , for t € [n, n + 1), n € Z, 

as the input (for some sequence x n ). The convolution y = h * x, continuous for 
any to. For example, the output with to = 1 is 

y(t) = x n + {t- n- \){x n+ i- x n ), for t e [n + \,n + |], n G Z, 

which is piecewise linear and continuous. Thus, thanks to a smoothing impulse 
response, a discontinuous function x is transformed into a C° function y. 



Circular Convolution 

We now consider what happens with our second class of functions, those that are 
either periodic or of finite length circularly extended. To start, we assume that the 
impulse response h is in £ 1 (K). 



Linear Convolution with Circularly-Extended Signal Given a bounded periodic 
function x(t) as in ( 13.18) and a filter with impulse response h(t) in £ 1 (K), we can 
compute the convolution as usual: 



y(t) = (h*x)(t) = / x(r)h(t - r) dr = / h{r)x{t - r) dr. (3.39) 



Since x(t) is T-periodic, y(t) is T-periodic as well: 



y{t + T) = / h(r)x(t + T - r) dr ( = / h(r)x(t - r) dr = y(t), 



where (a) follows from the periodicity of x. 

Let us now define a periodized version of h{t) as: 



h T (t) = J2h{t-nT), (3.40) 



which converges for every t because h G £ (R). The new function Kt is T-periodic 
with a period in £ (M). We now want to show how we can express the convolution 
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in ( 13.39) in terms of what we will define as a circular convolution: 

/•oo pnT+T/2 

(h*x)(t) = h{ T )x{t - t) dr ( => Y^ j h(r)x(t - t) cLt 



n& JnT ~ T / 2 
T/2 



(c) 



/ h{r' + nT)x(t - t' - nT) dr' 

neJ- T / 2 

fT/2 

V / h(r + nT)x(t - t) dr 
ne z J - T / 2 

fT/2 

/ zl /i ( T + nT ) x ( t - T ) dT 

J-T/2~rt 



(d) 

" T /2 nG 



h T (r) 
T/2 

h T {r)x{t - t) dr = (h T ®x)(t), (3.41) 

-T/2 

where in (a) we split the real line into segments of length T; (b) follows from change 
of variable r' = r — nT; (c) follows from periodicity of x and change of variable 
t = t' '; and (d) follows from ( 11.196) . The expression above tends to be more 
convenient as it involves only one period of both x and the periodized version hx 
of the impulse response h. 

Definition of the Circular Convolution In computing the convolution of a periodic 
function x with an impulse response h € £ 1 (K), we implicitly defined the circular 
convolution of a T-periodic function x and a T-periodic impulse response h: 



Definition 3.9 (Circular convolution) The circular convolution between 
T-periodic functions h and x is defined as 

(Hx)(t) = (h®x)(t) = j h(r)x(t - t) dr, (3.42) 

J-T/2 

where H is called the circular convolution operator associated with h. 



The result of the circular convolution is again T-periodic. While this notion of 
convolution is independent from that of linear convolution, we have just seen that 
the two are related when the input function is periodic but the impulse response of 
the system is not. We made the connection by periodizing that impulse response. 

3.4 Fourier Transform 

As discussed in Section [2.41 as well as at the beginning of this chapter, the ubiquity 
of the Fourier transform is mostly due to the fact that the complex exponentials are 
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eigenfunctions of LSI systems (convolution operators), and operating in the spaces 
defined by those eigenfunctions is a natural representation for LSI systems. These 
facts, as in discrete time, lead to the convolution property, which states that the 
convolution operator is diagonalized by the Fourier transform. In this section, we 
review the Fourier transform for functions on the real line. 

3.4.1 Definition of the Fourier Transform 

Eigenfunctions of the Convolution Operator We now expand on what we have 
seen in the introduction as well as in Chapter \2\ we demonstrate a fundamental 
property of LSI systems: complex exponential functions v\ are eigenfunctions of 
the convolution operator H, 

Hv x = h*v x = Xv Xl (3.43) 

with the convolution operator as in ( J3.36) . As before, to prove this, we must 
find these v\ and the corresponding eigenvalues A. Not surprisingly, the complex 
exponential function 

v u (t) = e jut (3.44) 

generates an entire space S^, = {ae JUJt \ a € C,uj € K}. The quantity to is called 
angular frequency; it is measured in radians per second. With u> = 2irf, the quantity 
/ is called frequency; it is is measured in Hertz, or the number of cycles per second. 
Let us now check what happens if we convolve h with v u as in ( 13.43) : 



H v u = h * v u = / Vu,(t - r)/i(r) dr 
J —00 

[•00 
e Mt~r) h ^ dT = / ft( r ) e -J«r dr e jwt . (3.45) 



Indeed, applying the convolution operator H to the complex exponential function 
Vuj(t) = e 3Ut results in the same function, albeit scaled by the corresponding eigen- 
value A w . We call that eigenvalue the frequency response of the system. Although 
we have already seen it in the introduction, we repeat it here for completeness: 



H(u) = \u> = / h{r)e-^ T dT. (3.46) 

J — 00 

We can thus rewrite ( 13.45) as 

He jut = h*e j ^ = H(w)e jut , (3.47) 

The above is true for any u w € S u , and thus, that space does not change (it is 
invariant) under the operation of convolution. 
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Fourier Transform We are now ready to define the Fourier transform, which 
amounts to projecting x onto each each of the 5 W : 



Definition 3.10 (Fourier transform) The Fourier transform of a function 
x(t) is 

/oo 
x{t)e~ jui dt, uj el, (3.48a) 

-00 

It exists when ( 13. 48a) converges for all w 6 B; we then call it the spectrum of x. 
The inverse Fourier transform of X{u>) is 

1 C°° 
x{t) = — / X{to)e 3Ut dcv, t e E. (3.48b) 

27r J -00 

When the Fourier transform exists, we denote the Fourier-transform pair as 

x(t) ^ X{lu). 



3.4.2 Existence and Convergence of the Fourier Transform 

The existence, convergence, and inversion of the Fourier transform of a function 
will depend strongly on its properties, such as to which space its belongs. We will 
concentrate on basic cases where precise statements can be made without too much 
technical baggage. However, it should be understood that the validity of many of 
the relations we state is much wider; this will be indicated later. 

Functions in £ 1 (M) If x € £ 1 (IR), then X(w) converges pointwise, and X(uj) is 
bounded and continuous (Solved Exercise 13-31 ) , and the inversion formula (3.48b[ ) 
holds. Since X(u>) g C (R), x(t) is also bounded and continuous. The proof of the 
inversion formula for x € £ (R) is somewhat technical and can be found in [lOOj . 

Example 3.3 (Fourier transform of the hat function) Consider the hat 
function: 

' \ 0, otherwise. ^ ' ' 

Its Fourier transform is 

X(u) = f {t + l)e- 3UJt dt+ f {1 - t)e- jut dt. (3.49b) 

J-i Jo 

Pulling the constant terms together: 

e-^dt = e~ 3wt \\ = . (3.49c) 

1 jw ju 



L 
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(a) 




Figure 3.4: The hat function (a) and its Fourier transform (b). 



Then, 



JO 



(-ty- jut at 



(a) 1 



^ 2 ./o 



3U 



(6) _L_ 

UJ 2 



ue u du = -^ (u-l)e u \~ JU 



1 



(3.49d) 



where (a) follows from change of variable u = —juit; and (b) from the fact that 
the primitive of ue u is (u — l)e u \ I By similar arguments, 



JU) LU Z U! z 



Summing ( ^49c] )-( ^49ej ), we find X{u) from ( |3.49bf ) 



X(w) 









sine — 
V V2 



(3.49e) 

2 . (3.49f) 
The hat 



This Fourier transform is absolutely integrable and is thus in £ (I 
function and its Fourier transform are depicted in Figure 13.41 

The hat function can also be seen as the convolution of two box functions 
from ( 1 3 . 1 1 [ ) with to = 1 1 as we will see in Example 13.41 the Fourier transform 
of such a box function is sinc(w/2). Then, by convolution property ( 13.64) , the 
Fourier transform of a convolution is the product of the Fourier transforms, 
leading to ( j3.49f|) . 

Functions in £ 2 (K) If x & £ 2 (R), the inversion formula ( |3.48b[ ) holds. Moreover, 
the C 2 norm is conserved (up to a constant), since 



ll*(*)U J 



2tt 



II^MII 5 



(3.50) 



71 A function x(t) is a primitive of y(t), if x'(t) = y(t). We will denote that primitive function 
with a superscript 'l), as in y^(t). 
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Figure 3.5: The (a) box function and (b) its Fourier transform (for to = !)■ 



This is the Parseval's equality 72 ! from Chapter Q], and is formally given in ( |3.69aj ). 

The extension from C 1 to C 2 is technically nontrivial, since when x(t) is not 
absolutely integrable, then neither is x(t)e~ 3 . We refer to [100], [19] for a thorough 
discussion of this topic, and only prove Parseval's equality for functions that are in 
both C 1 and C 2 , later in this chapter. 

Example 3.4 (Fourier transform of the box function) We now derive 
the Fourier transform of the box function from (13.111) : 



1 /-to/2 

X(w) = -= / e~ iut dt 

V to J -to/2 



1 



jUJy/to 



-jurtlW 2 
I— 1 /2 



e Ju>t /2 _ e -jujt /2 



jWyJ% 



to sine 



wto 



(3.51a) 



This function is not absolutely integrable, since it decays only as 0(1/| 1 + tjj). 
By Parseval's equality, it is in £ 2 (R), however, allowing us to use C 2 inversion 1 73 | 
The function and its Fourier transform for to = 1 are shown in Figure 13.51 Using 
( 12. 8c) . we see that the Fourier transform of the box function is zero at all integer 
multiples of lu = 2n /to'- 



hh- 



(3.51b) 



As we already mentioned in Example 13.31 the hat function ( 13. 49a) is simply 
the convolution of the box function (when to = 1) with itself; thus, 



x * x 



FT 



x» 



( S i»« ©> 2 , 



that is, its Fourier transform can be obtained from the Fourier transform of the 



box function using the convolution property ( 13.64) . 



72 Recall that what we call Parseval's equality in this book is sometimes called PlanchereFs 
equality as well; what we call generalized Parseval's equality is Parseval's theorem. 

73 At points of discontinuity, the inversion leads to a midpoint reconstruction. 
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Example 3.5 (Fourier transform of the sinc function) We just looked 
at the box function in time; the dual case is the box function in frequency, which 
is used to represent an ideal lowpass filter. Such a filter keeps frequencies between 
— ojq/2 and loq/2 perfectly while suppressing all others, or 



= , VW^, M<«o/2; (352a) 

0, otherwise. v ; 

Except for scaling, this is dual to Example 13.41 and thus 

h(t) = -jL= f ° 2 e^du = ,/^sinc (^) . (3.52b) 



\/27rw J-u /2 



As in its discrete-time counterpart in Table 2.51 the factor ^u>q/2tt is present to 
make hit) unit norm. Using ( |2.8c[ ): 

f 2kir\ I tun 

M = \hr 5 k, (3.52c) 



OJ 

that is, h(t) is zero at all integer multiples of T = 2tt/ljq, except at the origin. 

Convergence The history of Fourier analysis and Fourier series is marked by re- 
sults showing potential problems with convergence. While for continuous functions 
in C 1 or C 2 things are relatively straightforward, not so for functions from other 
spaces. Yet, the construction of such functions (for example, Weierstrass's continu- 
ous, but nowhere differentiable function), led to a more fundamental understanding 
of the notion of a function, but also of subtle concepts such as Brownian motion, 
studied at the time. 

Various forms of convergence of the Fourier transform or its inverse are pos- 
sible; the three main ones are pointwise convergence, uniform convergence and 
convergence in norm (see Appendix 1.A.2) . A question of both theoretical and 
practical interest is the following: If a function x(t) has Fourier transform X(ui), 
and we compute the inverse Fourier transform x(t) from X(u>), when is x(t) = x(t), 
and in what sense (for example, almost everywhere, in norm)? 

(i) If x(t) G £ 1 (R) and X(u) G £ X ( R )> then »(*) = »(*) almost everywhere. If in 
addition, x{t) is continuous, then 

x(t) = x{t) for all t. (3.53a) 

This follows from C 1 inversion, by adding the fact that two continuous func- 
tions that are almost everywhere equal are necessarily equal everywhere. 

(ii) Ex(t) G £ 2 (R), then 

\\x(t) - x{t)\\ =0, (3.53b) 

since the Fourier transform is a unitary map (see ( 13.50) ). 
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A single discontinuity in a function leads to a Fourier transform that is noninte- 
grable. Its inverse Fourier transform does not converge uniformly, and this behavior 
that is both famous and annoying, was first described by Gibbs in the late 19th cen- 
tury. We have already seen an example in Figure 10.31 : for the discrete version of this 
phenomenon, see Section 12.4.21 

Example 3.6 (Gibbs phenomenon) Consider a function x(t) smooth every- 
where except for a single, step discontinuity. Without loss of generality, assume 
this discontinuity to be at the origin and of height 1. Therefore, we can write 
x(t) as 

x(t) = x s (t)+u(t) (3.54a) 

where x s (t) is smooth everywhere and u(t) is the Heaviside function ( 13. 8| ) (see 
Figure 13 .6 j ). 

Now consider X(ui) and its restriction to [— u>o/2, u)q/2], 

XM = X | h „ /W2] M . { £<»>■ M < ** ,3.54b) 

This can be seen as a lowpass version of X(uj), that is, 

Xuj„ = h Uo *x = h Uo * (x a + u) = h Uo * Xsh^ +h UJo *u, (3.54c) 




where h Uo (t) is as in ( |3.52bj) . The question now is how and if x uo (t) converges 
to x(t) as wo — ► 00. First, we can ignore the smooth part h Uo * x s ; it will 
converge nicely as its Fourier transform decays rapidly. The problematic part 
is the reconstruction of u(t) from its restriction u Un (t). From the convolution 
property ( 13.64) 

t) dr = \ — / sinc(ujQ (t — t)/ 2) dr 
V 27r J 





/^OC 


t) = 


/ u( T )h Wo (t 




1 — 00 


W 


/ 9 rUot/'Ji 




V W 7T J_ 00 



/ sinc(r') dr' , 

where (a) follows from (3.52b) , and (b) by change of variable r' = uJo(t — r)/2. 

Recall the shape of the sinc(r) function in Figure [375T b); it has a maximum 
at r = 0, and zero crossings at multiples of ir (except zero). Consider now the 
above integral. When coot/2 — > —00, the integral vanishes, while as uiot/2 — > 00, 
it goes to 1. In between, it oscillates, with local maxima as sinc(r) changes sign, 
that is, when r = kir, or r = kir/cuo. The largest oscillations are right around the 
origin, where sinc(r) has the largest changes (see Figure [377) . This overshoot is 
roughly 9% for the local maxima right next to the origin, decaying as 0(l/|r|) 
farther from the origin. 

An interesting twist is that the height of the overshoot does not depend 
on wo j but only on its location. As wo ~~ * °°j even though the time axis gets 
compressed, the same overshoot and undershoot remain. Thus, the maximum 
error due to these oscillations stays constant regardless of how large loq gets. This 
is one of fundamental problems in Fourier analysis of discontinuous functions. 
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FT properties 



Time domain 



FT domain 



Basic properties 

Linearity 

Shift in time 

Shift in frequency 

Scaling in time and frequency 

Time reversal 

Differentiation in time 

Differentiation in frequency 

Integration 

Moments 

Convolution in time 
Convolution in frequency 

Deterministic autocorrelation 
Deterministic crosscorrelation 
Parseval's equality 



ax(t)+/3y(t) 

x(t - t ) 

e^ at x{t) 

x(ai) 

x(-t) 

d n x{t)/dt n 

(-jt) n x(t) 



aX(co) + /3Y(co) 
e -j«,t x ^ 

X(u> -u> ) 

(l/a)X (u/a) 
X(-w) 
(joj) n X(oj) 
d n X{u)/dw n 

X(ui)/jui, X(0) = 



/x(t) dr 
- oo 

m k = f°° t k x(t)dt= (j) k d k X(u))/du) k \ 

</— OO ILU = 

(h*x)(t) H(u)X(u>) 

h{t)x{t) (1/2tt)(H * X)(u>) 

/"CO 

a(t) = x{t) x* (t - t) dr A(ui) = \X(u;)\ 2 

J — OO 
/■OO 

c(t)= x(t) y* (r - t) dr C(ui) = X{w)Y*(w) 

J — OO 

/OO /'OO 

\x(t)\ 2 dt = (1/2tt) / \X{u))\ 2 du = (l/2vr) ||X|| 
-co J — OO 



Symmetries 








Conjugate 


x"(t) 




*■•(-«) 


Conjugate, time reversed 


x*(-t) 




x'M 


Real part 


»(*(*)) 




(X(w)+X*(-a;))/2 


Imaginary part 


9(s(t)) 




(A-(w)-A-(-a;))/^ 


Conjugate-symmetric part 


(x(t) +x*(- 


-t))/2 


»(Jf(«)) 


Conjugate-antisymmetric part 


(x(t)-x*(- 


-t))/2j 


9(X(w)) 


Symmetries for real x 








X conjugate symmetric 






X(w) = X*(-u>) 


Real part of X even 






»(X(w)) =SR(X(-a;)) 


Imaginary part of X odd 






9(X(w)) = -9(X-(-w)) 


Magnitude of X even 






|Jf(«)| = |Jf(-w)| 


Phase of Xodd 






argX(aj) = — argX(— o>) 


Common transform pairs 








Dirac delta function 


5(t) 




1 


Shift in time 


S(t-to) 




e -J«, t 


Dirac delta comb 


J2 *(* - n ' 


T) 


(2ir/T) E <5(w - 2w/Tk) 




nEZ 




fcez 


Exponential function 


e -«|t| 




(2a)/(o, 2 +a 2 ) 




e"- 2 




\Ar/ae~ u ' Q 


Ideal lowpass filter 


W — sinc(a;oi/2) 

V 27T 


| y/2ir/u , M < cj /2; 
I 0, otherwise. 


Box function 




1*1 <*o/2; 
otherwise, 


V / *o"sinc (urtn/2) 


Hat function 


f 1-1*1, 


1*1 < i; 

otherwise. 


(sinc(cj/2)) 2 



Table 3.2: Properties of the Fourier transform. 
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x s (t) 



«(*) 




Piecewise smooth function 



smooth function + unit-step function. 



Figure 3.6: Decomposition of a piecewise smooth function with a single discontinuity 
into a smooth function and a step function. 




(a) sinc(i) 



sinc(r) dr 



Figure 3.7: The sine function and its integral for different values of ujq. 

3.4.3 Properties of the Fourier Transform 

Basic Properties 

We list here the basic properties of the Fourier transform (assuming it exists); 
Table [3T21 summarizes these, together with symmetries as well as standard transform 
pairs. 



Linearity The Fourier transform operator F is a linear operator, or, 



axit) + (3y{t) 



FT 



aX(w) + PY(w). 



(3.55) 



Shift in Time The Fourier-transform pair corresponding to a shift in time by to is 



x(t- t ) 



FT 



-jurto X ( u 



(3.56) 



Shift in Frequency The Fourier-transform pair corresponding to a shift in fre- 
quency by wo is 

e^ ot x{t) ^ X(w-wo). (3.57) 

As in Chapter \2\ a shift in frequency is often referred to as modulation in time, and 
is dual to the shift in time. 
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Scaling in Time and Frequency The Fourier-transform pair corresponding to scal- 
ing in time by a is scaling in frequency by I /a, 



ft 1 

a \a 

We often use normalized rescaling, namely, for a > 



x(at) <^ -X(-). (3.58a) 

a Vrv/ 



</ax(at) ^ -=X (-) , (3.58b) 

which conserves the C 2 norm of x(t) (and thus of X(cv)), since 

/oo /*oo 7 

a|ic(ai)| 2 d£ = a \x{t)\ 2 — = \\x\\ 2 , (3.58c) 

-OO J— CO ^ 

where (a) follows from the change of variable t = at. This is another of the key 
properties of the Fourier transform, where a stretch in time is a compaction in 
frequency, and vice versa. 

Time Reversal The Fourier-transform pair corresponding to a time reversal x(—i) 
is 

x(-t) ^ X(-uj). (3.59) 

For a real x(t), the Fourier transform of the time-reversed version x(—t) is X*(u)). 

Differentiation in Time The Fourier-transform pair corresponding to differentia- 
tion in time is 

^ ^ U» r X(»), (3.60) 

assuming the derivatives exist and are bounded (see Exercise 13.31 ), or, equivalently, 
assuming uj n X{u>) is absolutely integrable. 

Differentiation in Frequency The Fourier-transform pair corresponding to differ- 
entiation in frequency is 

(-*)"*(*) ^ ^- (3.61a) 



This is obtained by multiple applications of 

tx{t)e~^ l dt = j — / x(t)e~ 3ut dt 

duj I „ 



dui 
assuming t n x(t) for all k = 0, 1, . . . ,n is integrable, and using 



dX(u) 
j— M (3.61b) 



J- 



te- J ' w *. 



du> 
This result is dual to differentiation in time above. 
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Integration The Fourier-transform pair corresponding to integration in time is 
(withX(O) =0) 



x(t) dr 



FT 



1 



X(uj) 



Moments Computing the &;th moment using the Fourier transform is 



m k 



Mt)dt = ti)* d -jM 



fcGN, 



O)=0 



as a direct application of ( 13.61a) . We give below the first two moments: 



m 



mi 



x{t) dt 
tx(t) dt 



x{t)e~ jut dt 



u=0 



tx{t)e' jut dt 



X(0), 

dX(u) 



J 



j=o 



duj 



(3.62) 

(3.63a) 

(3.63b) 
(3.63c) 



O)=0 



Convolution in Time The Fourier-transform pair corresponding to convolution in 
time is 

(h*x)(t) ^ H{uj)X{uj). (3.64) 

Thus, as in Chapter [2j given a function x and a filter h, in the Fourier domain, their 
convolution maps to the product of the spectrum of the function and frequency re- 
sponse of the filter. This result is a direct consequence of the eigenfunction property 
of complex exponential functions v u from ( 13.45) : since each spectral component is 
the projection of the function x onto the appropriate invariant space, the Fourier 
transform diagonalizes the convolution operator. Assume that both x and h are in 
£ 1 (K). Then, h* x is also in £ 1 (M), since 



\(h*x)(t)\ dt 



x(r)h(t — t) dr 



dt 



< 

(a) 



OO /»CO 



x{t)\ \h(t-r)\ drdt 
x(t)\(£ \h(t-T)\ctiJ dr = IMHHU, 



— oo J — OO 
OO 



where (a) follows from Fubini's theorem (see Appendix 1.A.3) allowing for the ex- 
change of the order of integration (allowed since both x and h are in £ 1 (]R)). 
The spectrum Y(u>) of the output function y = h * x can be written as 



Y{w) 



— OO 

oo /»oo 






y(t)e- jut dt = / ( f x(r)h(t - r) dr) e~^ 1 dt 

J —oo \J — OO / 

x(T)e- JUJT h(t - T )e- ju}{t - T) dr dt 



' — OO J — OO 

"" l x(T)e' jUT dr / h{t - rje-^-^dit - t) = X(uj)H(lu), 



OO 



OO 
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Figure 3.8: Computation of convolution and its equivalent using primitive and derivative 
instead, (a) h*X. (b) dh(t)/dt*x' 1 '. 



where (a) follows again from Fubini's theorem (see Appendix I1.A.3D allowing for 
the exchange of the order of integration. 

While the convolution property states that convolution in time equals to mul- 
tiplication frequency, the proof does not really convey the intuition we saw in ( 13.471 ) , 
which we now repeat. If the function x(t) has a frequency component at frequency 
uio, X(u>o), then this frequency will go through the convolution operator without 
frequency change (the eigenfunction property) but with a weighting given by the 
frequency response of the filter h(t) at frequency luq, H(luq) (eigenvalue of the eigen- 
function of frequency loq). Therefore, the output at frequency luq is indeed 

Y(w ) = H(lu q )X(u ). 



Convolution in Frequency The Fourier-transform pair corresponding to convolu- 
tion in frequency is 

ft 1 



h{t)x{t) 



2n 



(H*X)(u), 



(3.65) 



as to be expected by the duality of time and frequency. 



Example 3.7 (Derivative of a convolution) Consider y = h*x and com- 
pute its derivative (which we assume to be well defined), 



dy(t) d(h*x){t) 



dt 



dt 



(3.66a) 



The Fourier transform of dy(t)/dt, according to (3.60) , is 

dY{uj) 



<ko 



jujH{uj)X(uj) 



(3.66b) 
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Since the jui term can be associated either with X(u>) or H{ui), using ( 13.60J ) with 
n = 1, we see that dy(t)/dt can be written either of the following two ways: 

dy(t) dx(t) dh(t) 

= h * = * x. (3.66c) 

dt dt dt 

This formula, which resembles integration by parts, is important because it allows 
one to rewrite any convolution h * x as the convolution of the primitive of one 
function and the derivative of the other, 

h * z= ^)^(D = ft <i>»^& ( 3 .66d) 

dt dt y ' 

where x^ 1 ' and hS 1 ' denote the primitives of x(t) and h(t), respectively. A picto- 
rial example is shown in Figure 13.81 

Deterministic Autocorrelation The Fourier-transform pair corresponding to the 
deterministic autocorrelation of a function x(t) is 

/oo 
x(t) x* (r - t) dr <^ A{uj) = \X{uj)\ 2 . (3.67) 

-00 

To show this, express the deterministic autocorrelation as a convolution of x and 
its time-reversed version as in ( ]3.37d[ ), x(t) * x*(—t). We know from Table [3721 that 
the Fourier transform of x*(—t) is X*(u). Then, using the convolution property 
p4) , we obtain ( J3T671 ). 

Deterministic Crosscorrelation The Fourier-transform pair corresponding to the 
deterministic crosscorrelation of functions x(t) and y(t) is 

t>oo 

c(t) = / x{T)y*{ T -t)dT ^ C(w) = X(uj)Y*(uj). (3.68) 



Parseval's Equality The Fourier-transform operator F is a unitary operator (within 
scaling) and thus preserves the Euclidean norm (see ( I1.51J )), 

INI 2 = ^ll^f- (3.69a) 

Zir 

This follows from 

/oo 1 /.oo 1 -. 

\x(i)\ 2 dt = - / |XH| 2 d W = -\\Xf = -\\Fx\\ 2 . 
-00 27r J -00 27T 27T 

We now prove Parseval's equality for functions that are both in C 1 and C 2 : 

\\x\\ 2 [ ^ o(0) = — / A(u:)e JUJt duo 

(6) 1 



1 f I 

- / \X{w)\ 2 dw = —\\X{oj) 
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where (a) follows from ( |3.14[ ); and (b) from ( )3.67[ ). 

Similarly, generalized Parseval's equality can be written as 

(x,y)t = ^-{X,Y) U , (3.69b) 

following from 

(x,y) t = I x(t)y*(t)dt = ■£- [ X{u>)Y*{u)du = ^-(X,Y) U . 

The proof, a simple extension of ( 13. 69a) , is left as Exercise 13.41 The key notion 
emerging from both Parseval's and generalized Parseval's equalities, is the conser- 
vation of the C 2 norm and inner product between time and frequency, showing the 
Fourier transform to be a unitary map between these domains (up to scaling). 

Transform Pairs 

Rather than an exhaustive list of Fourier-transform pairs, we highlight a few which 
are routinely used, even those that are neither in C 1 nor in C 2 . These, as well as 
some other ones, are summarized in Table 13.21 

Dirac Delta Function We summarized some of the commonly used Fourier-transform 
pairs involving the Dirac delta function in Table 13.21 Note that by now, we are far 
from C 1 or C 2 , since the function or spectrum has no decay, and exist in a generalized 
sense using the distribution 5(t) or 8(ui). Using this powerful formalism combined 
with the various properties derived previously, it is easy to compute a plethora of 
Fourier transforms, for example, the transforms of trigonometric functions, the step 
function, etc.; see Table [3721 

Using the shift in time property of the Dirac delta function from Table 13.2 
and linearity, we can compute the Fourier transform of the Dirac delta comb (3.7) : 

S T (u) = ^V^ nT . (3.70) 

One can show that the above sum is itself a sequence of Dirac delta functions, 

£^" T = f£^-!4 (3.71) 

This is a direct consequence of the Poisson sum formula, which we discuss next. 



Theorem 3.11 


(POISSON 


SUM FORMULA) 


Given 


is the Dirac delta 1 


3omb func- 


tion 


(13771) 


and a 


function x 


(t) of sufficient decay sc 


1 that its periodized 


version 








x T {t) 


= (s T *x)(t) -- 




(3.72) 


converges 


uniformly. Then: 
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(i) 


The Dirac delta comb function and its Fourier-transform pair are: 






n(EZ fcSZ ^ ' 


(3.73a) 


(ii) 


The Poisson sum formula states that: 






nGZ fceZ ^ ' 


(3.73b) 


(iii) 


As a corollary, for T = 1 and t = 0, 






nGZ nGZ fceZ 


(3.73c) 


(iv) 


The following is a Fourier-transform pair: 






(x(t), x(t-n)) t = S n ^ J2\ X ( UJ + 2nk )\ 2 = l - 

feez 


(3.73d) 



Proof. The proof of ( ]3.73a[) is somewhat technical and is left to Exercise 3.51 Instead, 
we use it to show ( ]3.73bj) . 

Since xx(t) — st *t x, we can use the convolution property ( |3.64|) to obtain its 
Fourier-transform pair as 

'=¥£/(¥*) '("-7*). (») 

where (a) follows from the convolution property ( ]3.64j) ; (b) from (|3. 73a) , and (c) from 
the multiplication property in Table [3TTJ Taking the inverse Fourier transform of ( 13.741) , 
we get 






<5 ( a; — —A; ) e Ju;E da; 






j(2w/T)kt 



where (a) follows again from the multiplication property in Table 3.11 

To prove ( ]3.73d|) , observe that the left-hand side is the deterministic autocorre- 
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lation a(r) evaluated at integers t — n, 

x(t)x(t - n) dt = / x(t)x(t -r)y] 6(t - n) dt, 

= f°° x{t)x{t - r) dt ^ 5{t - n) =' a{r) Sl (T), 

■'-°° n€Z 

where (a) follows from Table 3.1 and the fact that the expression is nonzero only for 
n — 0; (b) from pulling the Dirac delta comb out of the integral; and (c) from the 
definition of the deterministic autocorrelation ( J3.14J) (we assumed x(t) was real) and 
Dirac delta comb for T = 1 ( |3.7I) . Taking the Fourier transform of the above, we get 



1 = ^ A(u) * Si(u) ( = -!-|X(a;)| 2 *27ry 6 (lu - 2nk) 

S.7T /.IT * -J 



C => ^|X(w)| 2 *<5(o;-27rfc) ( = 5 ^ |X(w - 2nk)\ 2 , 

feez fcez 

where (a) follows from the convolution-in-frequency property of the Fourier transform 
( 13.65) ; (b) from (13.67] ) and (3.73a) ; in (c) we pulled out the summation in front of the 
convolution; and (d) from Table 3.11 

The Poisson sum formula has many applications; in signal processing, it is most 
often used in the proof of the sampling theorem (see Chapter 3} , since the process 
of sampling can be described as a multiplication of an input signal x(t) by a Dirac 
delta comb sr(i). 

Sine Function While we have seen the Fourier-transform pair of the sine function 
a few times so far, we repeat it here for completeness. The sine function was given 
in ( ]2.8a) and Example 13.51 The sine Fourier-transform pair is (scaled so that the 
time-domain function is unit norm): 

u\ r^o ■ i +m\ FT vi \ / v/27r/u , |w| < wo/2; /Q7 ^ 

x (t) = \ — sinc(o;ni/2) < — ► Xiuj) = \ ' ,, . (3.75) 

w V 2tt v ' ; y J \ 0, otherwise. v ; 

In other words, the Fourier transform of a sine is a box function in frequency, or, a 
lowpass filter (see Figure 13797 a) and (d)). 



Box Function By duality, the Fourier transform of the box function ( 13.11) in time 
is the sine function in frequency (see Figure ]3JJ(b) and (e)): 



f 1/V%, \t\<t /2; FT r- 

\ 0, otherwise, ^ X{uJ > " Vt ° 



*«) = { 0, otherwise', ~ *(«) = ^sfccMo/2). (3.76) 

Heaviside Function The Heaviside function was defined in ( 13.8) . Its Fourier trans- 
form (in the sense of distributions) is 

u(t) = { l' *t" 0; • «^> U(u) = ±6(w)-—. (3.77) 

w \ 0, otherwise. v ' 2 K ' 2juj v ; 
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1 


x(t) 


Vt/ 






1 /^\/^ 


\S \J T 





(a) Sine function. 



"0 
2 



(d) Fourier transform. 





1 


x(t) 




V<0 












■f 


-hi 

2 


2 




(b) Box function 
X(u) 



(c) Gaussian function. 
G(lo) 




(e) Fourier transform. 



(f) Fourier transform. 



Figure 3.9: Time and frequency views of a function. 



Gaussian Function The Fourier transform of a Gaussian function in time is a 
Gaussian function in frequency (see Figure [3jS (c) and (f)): 



m 



7e 



FT 



G(u>) 



a 



(3.78) 



for a, 7 positive real constants. That this is a Fourier-transform pair can be proven 

.2 

in various ways; the easiest is to observe that e~ is the solution of the differential 
equation 

dx(t) 



dl 



+ 2ax(t) = 0. 



Taking the Fourier transform and using ( I3.61b[ ) and ( 13.601 ), leads to an equivalent 
differential equation in Fourier domain, which, when solved, yields ( J3.78D (the topic 
of Exercise 1377]). 



Decay and Smoothness 

The Fourier transform allows us to go from the time domain to its dual, the fre- 
quency domain, and back. Given this duality and the nature of the Fourier trans- 
form, which takes a global view, local properties can be mapped to global features, 
and vice versa. Our goal is to characterize functions in terms of how smooth, or, 
regular, they are. As we will see, the decay of the corresponding Fourier trans- 
form is a powerful indicator of such regularity. We start with the coarsest version: 
functions with p continuous derivatives. We follow with Lipschitz regularity, which 
allows for a more local measurement on an interval or even at a point. 
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C p Regularity A global view the Fourier transform allows for easy characterization 
of functions with p continuous derivatives, that is, those functions belonging to C p 
spaces (see Section [O] ) . We have seen one such result in Section [3.4.21 (and Solved 
Exercise 13.31 ), where we stated that if x(t) G £ 1 (IR), then its Fourier transform X(ui) 
is bounded and continuous. Conversely, if |X(w)| decays faster than 1/|cj| for large 
\lo\, then x(t) is bounded and continuous. More precisely, if 

|xM| ^ 1+iV* ' (3 - 79a) 

for some constant 7 and e > 0, then |X(w)| is absolutely integrable, or, X(u>) G C . 
Therefore, the inverse Fourier transform x(t) is bounded and continuous, or, x(t) G 
C° . We can easily extend this argument (Exercise 13.91 ) to show that if 

l-^MI < -, 1 7 xu (3.79b) 

then x(t) G C p . Conversely, for x(t) G C p , its Fourier transform is bounded by 

\X(u>)\ < n-r— 7. (3.79c) 

For x{t) G C° (p = 0), its Fourier transform is bounded by 

l + |w| 

The slight asymmetry between ( I3.79b[ ) and ( I3.79c[ ) comes from the fact that if 
|X(w)| decays as 1/(1 + |oj| p+1 ), then it can happen that the pth derivative exists 
but might be discontinuous, as we show now. 

Example 3.8 (Decay and smoothness) Let us consider a few examples. 

(i) We start with the box function (3.11[ ) (with to = 1) and its Fourier trans- 
form X(uj) ( j3.51a| ). Since X(uj) is a sine, the best we can do is to say that 
it is bounded by 7/(1 + |w|). Therefore, we cannot say whether x(t) is 
continuous (of course, we know it is not). 

(ii) Next, consider the hat function x(t) from ( ]3.49a| . It is continuous, and thus, 
its Fourier transform decays at least as 7/(1 + |w|), for some 7, according 
to ( l3.79d.J t; actually, it decays much faster, as 7/(1 + |u| 2 )j 74 l 

(iii) Finally, we look at a function that has fast decay and is very smooth except 
for a single point t = where its derivative is discontinuous: 

x(t) = e- Q|t| . (3.80a) 



74 Note that throughout we use a generic constant 7 with the understanding that each 7 is 
different from the previous one. This is to avoid introducing several constants. 
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It is thus a C° function, even though it is C°° everywhere except at the 
origin. Its Fourier transform is 

/oo . , /»co /»oo 

-co Jo Jo 

(3.80b) 



a + juj a — juj (a 2 + ui 2 ) 

where in (a) we split the integral over negative/positive t, respectively, and 
then changed variable in the first one (sign change). Since x(t) is continuous 
(that is, it is in C°), according to ( ]3.79d) its Fourier transform must decay 
at least as 7/(1 + |w|). Conversely, since the Fourier transform decays 
exactly as 7/(1 + |w| ) but not faster, we cannot say more than that x(t) 
must be continuous. 

These three examples illustrate the power of the Fourier transform in analyzing 
functions; we will see in the latter part of the book that, in addition, wavelet 
tools allow for local characterization as well. 



Lipschitz Regularity Membership in a C p class gives us a coarse idea of the 
smoothness of a function. A finer analysis, allowing for characterizing local smooth- 
ness of a function on an interval or even a point, is possible using Lipschitz (or 
Holder) exponents. Beware, however, that the Fourier characterization which we 
discuss here, is again global as before. 



Definition 3.12 


(Lipschitz regularity) 


A function 


x(t) 


is 


called 


Lipschitz 


of order a, 


< a 


<1, 


when, for 
\x(t) - 


any t and to. 
-x(t )\ < c 


> 
\t~t \ a . 








(3.81) 



Higher-order characterization for r = n + a, n G N, can be obtained by 
replacing x(t) by its nth derivative. Note that Lipschitz regularity provides a sense 
of noninteger differentiability, but is actually weaker than the differentiability of 
the same integer order. For example, the hat function (3.49a) is Lipschitz of order 
(1 — e) for any positive e, while only C . 

To see how this regularity manifests itself in the Fourier domain, similarly 
to (3.79b[ ), we show in Solved Exercise 13.101 that a function x(t) is bounded and 
Lipschitz a when 



\X{oj)\(l + \uj\ a )duj < 00, (3.82) 

providing a global characterization of regularity. Interestingly, Lipschitz regularity 
can be defined locally, thus providing a local characterization as well; we will have 
to wait on the wavelet transform in Part II to see such local characterization. 
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3.4.4 Frequency Response of Filters 

The Fourier transform is defined for functions and we use spectrum to denote their 
Fourier transforms. The frequency response is defined for filters (systems) as 

/oo 
h(t)e' juJt dt, wet (3.83a) 

-00 

with the corresponding impulse response, 

1 r°° 

h(t) = — H(uj)e 3UJt dLu, teR. (3.83b) 

27r J-00 

As in discrete time, we often write the magnitude and phase of the frequency re- 
sponse separately: 

H{lu) = |i7H|e jars(H(w)) , 

where |7/(oj)| is a real positive function — the magnitude response, while arg(i7(cc>)) 
is a real function between — n and ir — the phase response. 

Diagonalization of the Convolution Operator As in Chapter \2[ from the form 
of ( 13. 48a) , the Fourier transform is clearly a linear operator. Let us denote this 
through X = F x. The adjoint F* is also a linear operator, and (l/27r)F* is the 
inverse Fourier-transform operator. Thus (1/2tt)F* F is an identity operator. 

While the Fourier-transform operator F diagonalizes any convolution operator 
H, this is a bit subtle. Though we cannot describe H as having a matrix representa- 
tion in a basis associated with the Fourier transform, the essence of diagonalization 
is present in ( 13.47) . The composition F H F* is a linear operator on the space of 
functions. Since the input X(us) can be expanded as X(u>) = J_ x(t)e~^ ut dt, we 
can conclude from ( 13T4T] ) that Y = {l/2ir)FHF*X results in Y^U)) = H{lo)X(lu). 
This captures the notion of diagonalization without writing a diagonal matrix. 

3.4.5 Laplace Transform 

In the previous chapter, we introduced the z-transform, which extends the DTFT 
from the unit circle to the complex plane, by constructing spaces spanned by com- 
plex exponentials of nonunit magnitude. The ^-transform is used when DTFT does 
not exist. Similarly, in this chapter, we introduce the Laplace transform, which 
extends the Fourier transform from the exponential of the imaginary axis to that 
of the entire complex plane, by constructing spaces spanned by general complex 
exponentials. The Laplace transform is used when the Fourier transform does not 
exist. The two transforms have similar properties, allowing us to extend the validity 
of many results to a larger class of functions. 

Definition 3.13 (Laplace transform) The Laplace transform of a function 
x(t) is a function of s = a + jlu, s G C given by 

-St 



X(s)= / x(t)e~ st dt. (3.84) 
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When s = ju, the Laplace transform is simply the Fourier transform; when 
s = a + ju, the Laplace transform is the Fourier transform of x'(t) = x(i)e~ at , or 



X(a+ju) = x(t)e~ at e~ juJt dt. (3.85) 




Therefore, convergence of the Laplace transform depends solely on the exponent a, 
and the ROC of the integral ( 13.841 ) consists of vertical strips. In particular, when 
x(£)er at is absolutely integrable, then ( 13.841 ) converges pointwise to a bounded and 
continuous function. Just as for the z-transform, the Laplace transform and its 
associated ROC define the time-domain function. The Laplace transform satisfies a 
number of properties which follow directly from their Fourier counterparts; selected 
ones are summarized in Table 13.31 We now illustrate the necessity of associating an 
ROC to a Laplace transform of a function. 



LT properties 


Time domain 


LT domain 


ROC 


Linearity 


ax(t) + /3y(t) 


aX(s)+(3Y{s) 


D ROC* n ROC;, 


Shift in time 


x(t- t ) 


e~ at ° X(s) 


ROC* 


Shift in s 


e st x(t) 


X(s- S) 


ROC* + S 


Scaling in time 


x(at) 


(l/a)X(s/a) 


a ROC* 


Convolution in time 


(h*x)(t) 


H(s)X(s) 


D ROC ft n ROC* 



Table 3.3: Selected properties of the Laplace transform. 

Example 3.9 (Laplace transform of the Heaviside function) Consider 
the Heaviside function x(t) = u(t) defined in ( 13.81 ). Its Fourier transform exists 
in the sense of distributions, and is given in Q3.77I ). Its Laplace transform 

/•oo />oo 

X(s) = / x(t)e~ st dt = / e- st dt, (3.86a) 



is well defined for a > 0, yielding X(s) = 1/s: 

x(t) ^4 X(s) = -, ROC = {s I 3f?(s) > 0}. (3.86b) 

s 

However, x(t) = —u(—t) does not have a Fourier transform (this is a bit techni- 
cal), but the Laplace transform exists and is also X(s) = 1/s, for a < 0: 

x(t) <^ X(s) = -, ROC = {s I sft(s) < 0}. (3.86c) 

s 

This example shows two important points. (1) The Laplace transform must have 
an ROC associated with it, since otherwise, two different functions can have the 
same Laplace transform expression. (2) Functions x(t) for which the Fourier 
transform does not exist, can have a well-defined Laplace transform. 
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3.5 Fourier Series 

Periodic functions, as well as finite-length functions circularly extended, as intro- 
duced in Section [3.21 have a series expansion in terms of Fourier coefficients. They 
are of great mathematical interest as questions of convergence occupied mathemati- 
cians for a long time after Fourier's original work. Moreover, the Fourier series is 
just the dual of the discrete-time Fourier transform seen in the previous chapter, 
and thus, its study is central to discrete-time signal processing as well. 

As we have done so far, we will also look at the Fourier series in terms of 
orthonormal bases and their geometrical properties, and will thus be interested in 
functions having a square integrable period. For such functions, the central result 
is the completeness of the Fourier series basis. 

Our treatment of the Fourier series will be brief; most of its properties are 
similar to those of the Fourier transform, or, its discrete-time counterpart, DFT. 
We follow a similar path as before, and show how, by defining an appropriate 
convolution for periodic functions, the Fourier series emerges naturally. We then 
define the Fourier series formally and proceed to discuss a few of its properties, 
including the duality with the DTFT. 

3.5.1 Definition of the Fourier Series 

Eigenfunctions of the Circular Convolution Operator We now follow the same 
path from both the previous chapter and the discussion on Fourier transform in 
Section 13.41 Clearly, our aim is to find spaces of functions that are invariant under 
the operation of circular convolution we defined in ( 13.91 ) . In other words, we want 
to find eigenfunctions v of the circular convolution operator H as in ( 13. 42j ) . 

These eigenfunctions are clearly going to be some form of a complex exponen- 
tial function; however, since we are dealing with the periodic convolution operator, 
these will be of the form (compare this to ( I3.44J ) as well as ( 12. 156ft in Chapter |2): 

v k {t) = e^° kt = e ji2 * /T)kt , (3.87) 

generating the corresponding spaces S k = {ae , ' 2r ' T ' ti | a G C,fc G Z}. Because 
they have to be periodic with period T, there are countably many of these spaces 
(index k G Z) as opposed to uncountably many for the Fourier transform (index 
w£l). The quantity k is called discrete frequency. 

We can now follow exactly the same process to check what happens if we apply 
the circular convolution operator onto one of these eigensfunctions: 

,T/2 ,T/2 

Hv k (t) = h®v k {t) = / v k (t - T)h{r) dr = / e j(27r/T)fc(t " T) /i(T) dr 

J-T/2 J-T/2 

h{T)e-^^ kT dr S 2v ' T ^. (3.88) 

-T/2 " " 



v k (t) 



^k—Hh 



As expected, applying the circular convolution operator to the complex exponential 
function v k (t) = e^ 27T ' T ' kt results in the same function, albeit scaled by the Corre- 
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sponding eigenvalue Xk, we call, as before, the frequency response Hk of the system. 
We can thus rewrite (3.88) as 

Fourier Series Finding the appropriate Fourier transform of x now amounts to 
projecting x onto each of the S*: 



Definition 3.14 (Fourier series) The Fourier series of a periodic 


function 


x(t) with period T is 




X k = 1 j T 2 x(t)e-^^ kt dt, k e Z; 

1 J -T/2 


(3.90a) 


we call it the spectrum, of x. The inverse Fourier series of Xk is 




x(t) = Y^ x ke j{27T/T)kt , te[-T/2,T/2). 
fcez 


(3.90b) 
es exists, 


It exists when (|3.90b|) converges for all t. When the inverse Fourier seri 


we denote the Fourier-series pair: 




x(t) ♦ **♦ X k . 





One often denotes lu$ = 2ir/T as the fundamental frequency 1/T expressed 
in radians. The coefficients Xk are called Fourier coefficients. The factor 1/T in 
( I3.90a| ) ensures that the transform is unitary as we will see shortly. Exercise 13.11 
explores the connection between the real Fourier series and (3.90b). 

Fourier Series as an Orthonormal Basis When a period of a function x(t) is 
square integrable, the inversion of the Fourier series in the L 2 sense is guaranteed. 
The best way to see this is by considering the complex exponentials of frequency 
{(27r/T)fc}fc S z as an orthonormal basis for the interval [—T/2, T/2). 



Theorem 3.15 (Fourier series as 


: AN 


ORTHONORMAL BASIS) The set 








Mt) = -Le^m kt: 




k eZ, 


te[ 


-T/2. 


<T/2), 


(3. 


91) 


forms 


an orthonormal basis for C 2 ([— 


T/2 


,T/2)). 













Proof. It is easy to see that {<pk(t)} is an orthonormal system, since 

(<PH, ft) = k f T/2 e K2 * /T){h - t)t dt = 5 k - e . (3.92) 

1 J -T/2 
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To show completeness, compute the normalized Fourier series coefficients of a 
function x(t) in £ 2 ([-T/2, T/2)): 

X k = (x, <p k ) = -L f T ' 2 x(t)e-^^ kt dt, (3.93) 

VT J -T/2 



and the (2N + l)-term approximation of x(t) 

N 
X N 



«) = 4 E X k e^^ k \ (3.94) 

* fc=-JV 



This approximation is an orthogonal projection of x(t) onto the subspace spanned by 
Wk{t)}k=-N- We now need to show that 

/•T/2 

lim / \x(t)-x N {t)\ 2 dt = 0, (3.95) 

W^oo J -T/2 

for any x(t) in C? ([— T/2, T/2)). This is left to Exercise 13. 12[ which considers continuous 
functions. The result can also be extended to general C functions by the argument 
that continuous functions are dense in C . 

This Hilbert space view of Fourier series is not only important from a math- 
ematical point of view, but also leads to geometric intuition, for example, on least 
squares approximation. We summarize this Hilbert space view in the following 
theorem: 

Theorem 3.16 (Fourier series in C 2 ) Given a T-periodic x(t) e 
£ 2 {[-T/2,T/2)): 

(i) C 2 inversion: The function x(t) can be written as 

x (t) = J2 X ^ K27T/T)kt (3.96a) 

fcgZ 

with Fourier coefficients 

X k = I [ T2 x(t)e-iW^ dt, (3.96b) 

1 J -T/2 

with equality in the C 2 sense. 
(ii) Norm conservation: The map is an isometry (distance-preserving map) be- 
tween the interval [—T/2, T/2) and ^ 2 (Z) (up to a scale factor) given by the 
generalized Parseval's equality 



(x,y)t=f x{t)y*{t)dt = TY,X k Y k \ (3.97a) 

J -Til 



rT/2 

-T/2 , t 

or, for x(t) = y(t), the Parseval's equality 



\x\\ 2 = j \x{t)\ 2 dt = TJ2\X\ 2 . (3.97b) 

J -T/2 
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(iii) 


Best least squares 


approximation. 


■ The function 








x N (t) = 


N 

Y, X k e^^ kt 

k=-N 


(3.97c) 




is the best least squares approximation of x(t) on the 

by {Mt)}L-N- 


subspace S spanned 



Proof. Part |(i) | follows from the orthonormal basis property of {<fk(t)}k€% shown in the 
previous theorem, and Part |(ii) lis equivalent to ( 1P3.12-2) with renormalization. Finally, 
consider an arbitrary function yjv(i) 6 S, 

N 
U\ V^ -j(2ir/T)kt 

yN[t) = 2^ ake 



Now, following ( ]3.97b|) , 

1 fT/2 N 

- \x(t)- yN (t)\ 2 dt = j2\ x *- Y *\ 2 = E i^-«*i+ E i x < 



feGZ k=-N \k\>N 

which is minimized when ctk — Xk for k £ [-N, N], or yjv(i) = xjv(t). 

Relation of the Fourier Series to the Fourier Transform What is the relation of 
the Fourier series coefficients of a periodic function x(t) to the Fourier transform 
of one period of that same function? Call that one period / = [— T/2,T/2), and 
xi(t) = x(t)x[~T/2,T/2)(t) the restriction of x(t) to /, that is, xj is equal to x(t) on 
/ and is otherwise, it is easy to verify (Solved Exercise 13.41 ) that 

X k = I Xj (^kj , (3.98) 

where Xj is the Fourier transform of xj. In other words, the Fourier series coeffi- 
cients of x(t) are samples of the Fourier transform of the same function restricted 
to one period I = [-T/2,T/2). 

Relation of the Fourier Series to the DTFT Consider the Fourier series of a 
27r-periodic function, or, from ( 13.90aj ) 



1 r 

— / x(t)e 



X k = — ar(t)e~ J/w dt. (3.99) 



We can recognize this as the inverse DTFT in ( |2.78bj ). In other words, the in- 
verse DTFT expresses the sequence x n as the Fourier series of a 2-7r-periodic DTFT 
X(e JLU ). Conversely, a periodic function x(t) can be seen as the DTFT of the Fourier 
series sequence X k . Table [3T51 summarizes this duality that helps us understand con- 
cepts in one domain, such as, for example, convergence and orthogonality, through 
the same concepts in the other. 
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3.5.2 Existence of the Fourier Series 

When the Fourier series coefficients of a T-periodic function are absolutely summable, 
then, just as in the Fourier-transform case, we have an inversion formula. 



Theorem 3.17 (/ 
and Fourier series 1 


'} inversion) G 
coefficients X k G 


liven T 


-periodic x(t) with one 
then for almost all t, 


period in £ 1 (K) 






X(t) : 


= £■ 

feGZ 


X k e jkuJot , 




(3.100) 


where luq = 
where. 


2tt/T. 


In addition, if ; 


v(t) is 


continuous, then equality holds every- 



The proof is similar to the equivalent proof for the Fourier transform. From 
the inversion formula, the following uniqueness holds: if two functions have the same 
period T and the same Fourier coefficients, then they are equal almost everywhere 
(Exercise 13. 13[ ). 

3.5.3 Properties of the Fourier Series 

Basic Properties 

We list here the basic properties of the Fourier series (assuming it exists) ; Table 13.41 
summarizes these, together with symmetries as well as standard transform pairs. 

Linearity The Fourier series operator F is a linear operator, or, 

ax(t) + (3y{t) ^ aX k +[3Y k . (3.101) 

Shift in Time The Fourier-series pair corresponding to a shift in time by to is 

z(i-to) ^ e-^/ T > fe4 »X fe . (3.102) 

Shift in Frequency The Fourier-series pair corresponding to a shift in frequency 
by k is 

e J(^/T)k t x{t) fs^ Xk _^ (31Q3) 

As in Chapter \2\ a shift in frequency is often referred to as modulation in time, and 
is dual to the shift in time. 

Time Reversal The Fourier-series pair corresponding to a time reversal x(—t) is 

x(-t) ^ X^ k . (3.104) 
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FS properties 


Time domain 




FS domain 


Basic properties 








Linearity 


ax(t)+/3y(t) 




aX k + /3Y k 


Shift in time 


x(t - t ) 




e -j(2*/T)kt Xk 


Shift in frequency 


e H2«/T)k t x ^ 




X(k-ko) 


Time reversal 


x(-t) 




X- k 


Differentiation 


d n x(t)/dt n 




{j27rk/T) n X k 


Integration 


/ x(t) dr 

J -T/2 




(T/j2nk)X k ,X =0 


Convolution in time 


(h®x){t) 




T H k X k 


Convolution in frequency 
Deterministic autocorrelation 

Deterministic crosscorrelation 

Parseval's equality 


h(t)x(t) 

fT/2 

a(t) = / x{t) x* (t - 

J -T/2 

fT/2 

c(t) = / x(t)j/*(t- 

J -T/2 
r T/2 

\\x\\ 2 = / \x(t)\ 2 dt 

J -T/2 


(H®X) k 
-t)dr A k =T\X k \ 2 

-t)dr C k = TX k Y k * 

= Tj2\ X k\ 2 =T\\X\\ 2 
feez 


Symmetries 








Conjugate 


X*(t) 




X-k 


Conjugate, time reversed 


x*(-t) 




*t 


Real part 


»(x(t)) 




{X k +X*_ k )/2 


Imaginary part 


9(x(t)) 




(X k -X*_ k )/2j 


Conjugate-symmetric part 


(x(t) + x*(-t))/2 




SR(X fc ) 


Conjugate-antisymmetric part 


(x(t)-x*(-t))/2j 




%(X k ) 


Symmetries for real x 








X conjugate symmetric 






X k = X - k 


Real part of X even 






&(X k ) = $t(X_ k ) 


Imaginary part of X odd 






Q(X k ) = -<Z(X_ k ) 


Magnitude of X even 






\x k \ = \x_ k \ 


Phase of X odd 






argX fc = -argX_ fc 


Common transform pairs 








Ideal lowpass filter 


I fco sinc(-7rfcoi/T) 
V T sinc(vrt/T) 




f 1/VkoT, \k\ < (fco - l)/2; 
[ 0, otherwise. 


Box function 


f 1/%/to", |*| < t /2; 
I 0, otherwise, 




sine (nkto/T) 



Table 3.4: Properties of the Fourier series. 



Differentiation The Fourier-series pair corresponding to differentiation in time is 



dt n J T ' 



(3.105) 
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3.5. Fourier Series 361 

This can be derived as follows: 

x'(t)e-^^ kt dt=x{t)e^^l^ ktT -j X (t)(-j — k)e-^^ kt dt 

-T/2 -T/2 J_ T/2 T 

= j%k [ T/2 z(t)e-«W dt = j%kX k . 

1 J -T/2 1 

A formula for the primitive of x(t), assuming it has zero mean (Xq = 0), can be 
derived similarly (Exercise 13.141 ) . 

Integration The Fourier-series pair corresponding to integration in time is (with 
X Q = 0) 

t rp 

x(r)dT ^ -TTT^- ( 3 - 106 ) 

-T/2 J 2lTk 

Convolution in Time As expected, given a periodic function x and a filter h (not 
necessarily periodic), the convolution property states that, in the Fourier domain, 
their convolution maps to the product of the Fourier coefficients of the function and 
frequency response of the filter: 

(h®x)(t) ^ TH k X k . (3.107) 

Exercise 13.151 proves the property when the filter ft is a periodized version of an 
infinite- length filter, and uses the fact that the frequency response of h is obtained 
by sampling the frequency response of that infinite-length filter. 

Convolution in Frequency The Fourier-series pair corresponding to convolution 
in frequency is 

h(t)x(t) ^ {H®X) k , (3.108) 

as to be expected by the duality of time and frequency. 

Deterministic Autocorrelation The Fourier-series pair corresponding to the de- 
terministic autocorrelation of a function x(t) is 

/.T/2 

a(t) = / x{t)x* (t - 1) dr «^U A k = T\X k \ 2 . (3.109) 

J-T/2 

Deterministic Crosscorrelation The Fourier-series pair corresponding to the de- 
terministic crosscorrelation of functions x(t) and y(t) is 

/.T/2 

c(t) = / x{T)y*(T-t)dT ^ C k = TX k Y k *. (3.110) 

J-T/2 
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Parseval's Equality The Fourier-transform operator F is a unitary operator (within 
scaling) and thus preserves the Euclidean norm (see ( 11.51) ): 

\\xf = T\\Fxf. (3.111) 

This follows from 

rT/2 

' T / 2 fcez 

Transform Pairs 



f 2 \x(t)\ 2 dt = Tj2\X k \ 2 = T\\X\\ 2 = T\\Fxf 

J-T/2 



We consider a couple of simple periodic functions to illustrate the behavior of Fourier 
series by applying some of the properties seen above. 

Square Wave Consider a square wave of period T = 1, with one period given by 

This function is real and antisymmetric; its Fourier series coefficients are thus purely 
imaginary (see Table 13.51 ) : 

r o ,1/2 

X k = / e ~^ kt dt- e - j2 * kt dt 

J-1/2 Jo 

= J-(cos(nk)-l) = -J-((-l) fc -l) 
j2ttk jzttk 

2j/ n ^ f° dd; (3.113) 

(J, k even. 

Triangle Wave Consider a triangle wave of period T = 1, with one period given 
by 

y(t) = f x( T )dT = { 1/2-1*1, |*| < 1/2. (3.114) 

J-1/2 

We can use the integral property ( 13.106) to find 

1/4, k = 0; 
Y k = { 0, fceven, k ^ 0; (3.115) 

l/(7rfc) 2 , fcodd. 



Exercise 13.16 explores alternate ways to compute this Fourier series. 

Decay and Smoothness 

Similarly to the Fourier transform, smoothness of the periodic function and decay 
of its Fourier coefficients are related, as seen, for example, in ( 13.98) , which relates 
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the Fourier transform of one period with the Fourier coefficients. In ( J3.113) , we 
saw that a discontinuous function such as the square wave has Fourier coefficients 
decaying only as 0(l/k) as, while a continuous function such as the triangle leads 
to a decay of 0(1/ k 2 ). 

More generally, using the derivative property, one can show that if 

J2\k\ p \X k \ < 00, (3.116) 

fcez 

then x(t) is p times differentiable. For example, the triangle wave satisfies the above 
for p = and is thus continuous. Exercise 13.171 explores the sawtooth function and 
the Gibbs phenomenon. 

3.5.4 Frequency Response of Filters 

The frequency response of a filter and the inverse frequency response are given by: 

1 r T / 2 

H k = - / h{t)e- l{2 * /T)kt dt, fceZ, (3.117a) 

T J-T/2 

h(t) = J2 H ke K2 " /T)kt , feK. (3.117b) 

feez 

Diagonalization of the Circular Convolution Operator Again, from ( 13.89) we 
can immediately see that the Fourier series operator F in X = F x diagonalizes the 
convolution operator H as in ( 13.42) . As this is a central concept, we show it here 
explicitly. We start with the output of the circular convolution of h and x: 

Q ST V, J(^/T)kt (J V^ Ur Y: e J(2ir/T)kt 



y {t ) £' £y fce *<*r/r>K ffl J2H k X k , 

feGZ fcGZ 

c ( T ) e -i(!hr/r)fcr dT J e j(2,/T)k tj 

-T/2 I 






where (a) follows from the definition of the inverse Fourier series, ( 13. 90b) ; (b) from 
the convolution-in-time property ( 13.64) ; and (c) from the definition of the Fourier 
series, ( 13.90a) . Figure [3.101 illustrates this diagonalization property. 

3.6 Continuous Stochastic Processes and Systems 

As in Chapter \2\ we now consider processes and systems in the presence of un- 
certainty. We use standard probability theory introduced in Section ll.CI to model 
uncertainty. To introduce the stochastic framework, this section follows the same 
structure as does the chapter: We start with continuous stochastic processes (ran- 
dom processes), followed by systems (almost exclusively LSI systems) and associ- 
ated functions both in time domain as averages in the form of means, variances and 
correlation functions, as well as in frequency domain such as power spectral density. 
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x(t)- 



Fourier 
series 



4 



H a 



4^ 



4- 



H 



JV-l 



Inverse 
Fourier 



-y(t) 



Figure 3.10: Diagonalization property of the Fourier series. 

3.6.1 Processes 

A continuous stochastic process is a function supported on R, characterized by 
specifying the PDF for all tuples (£1,^2, ■ ■ • , tjsf), N arbitrary; in other words, it is 
a random process (see Section ll.CD . If all random variables have the same distribu- 
tion and are independent, the process is called i.i.d. (independent and identically 
distributed). We use the following functions defined on a continuous stochastic 



process 



75 



mean 
variance 

standard deviation 
autocorrelation 



fi x (t) E[x(t)}; 

varO(t)) E [(x(t) - ft »(*)) 2 ] 

a x (t) v /var(x(i)); 

a x {r,t) E[x(T)x*(T-t)\. 



(3.118) 



Most of the time we will assume we are dealing with continuous wide-sense station- 
ary (WSS) processes, that is, those whose mean is constant and autocorrelation 
depends only on t: 

fhc(t) = [i-x\ a x (r,t) = a x (t). (3.119) 

Often, this assumption is valid at least for a portion of time of a given process. 

White Noise A very particular continuous stochastic process appearing widely 
in signal processing is the white noise^zl process x, whose mean is zero and its 
autocorrelation is a(t) = a^.6(t), or, 



Vx(t) = 0; var(x(t)) 



<r x (t) 



a x {t) = a 2 J{t). (3.120) 



If the underlying PDF is Gaussian, x is called white Gaussian noise, or sometimes, 
additive white Gaussian noise (AWGN). 

The white noise process is uncorrelated, but not always independent (in the 
case of the Gaussian PDF, it will automatically be independent as well). Often, 
the term whitening, or, decorrelation is used, meaning that a given process is made 
to have zero mean and Dirac delta function-like autocorrelation function, and is 
basically a diagonalization process for the covariance matrix. 



75 Although we have seen a version of these in Section fl.Cl we repeat them here for completeness. 
76 As in Chapter \2\ the Fourier transform of the autocorrelation of white noise is a constant, 
mimicking the behavior of the spectrum of the white light; thus the term white noise. 
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3.6.2 Systems 

We now assume that our input a; is a continuous WSS process, and the system 
is LSI described by its impulse response h, as in Section 12.3.31 Note that the 
system is deterministic, given by a fixed impulse response h. What can we say 
about the output? It is given by ( 12. 58} ), and we can compute the same functions 
on the output we computed on the input (mean, variance, standard deviation and 
autocorrelation). We start with the mean: 

My (t) = E[y(t)] = E[f x(t) h{t - t) dr] = f E[x(t)} h(t - r) dr 

J— OO J — OO 

( = } fi x ^ h(t-T)dr @ fi x H(e3°) = fiy, (3.121a) 

J — OO 

where (a) follows from ( 13.36) ; (b) from the linearity of the expectation; (c) from x 
being WSS; and (d) from the frequency response of the LSI system. We can thus 
see that the mean of the output is a constant, independent of t. The variance is 

var(i/(t)) = a yfl -{fi y f, (3.121b) 

and the autocorrelation of the output 

a y {T,t) = E[y(T)y*(T-t)} 

(a) 

(£) 

( = } 

where (a) follows from the definition of convolution ( 13.361 ); (b) from linearity of 
expectation; (c) from the expression for the autocorrelation of the WSS x, ( 13.118) ; 
(d) from change of variable p = q — r; and (e) from the definition of deterministic 
autocorrelation ( 13.141 ). From this, we can see that if the input is WSS, the output is 
WSS as well (as the mean is a constant and the autocorrelation depends only on the 
difference t) . We also see that the autocorrelation of the output is the convolution 
of the stochastic autocorrelation of the input and the deterministic autocorrelation 
of the impulse response of the system. 



—00 

00 /"OO 



-cjo <J —00 
00 />oo 



E[\ x(t - q)h(q) dq / x* (t - t - r)h* (r) dr] 

J —OO 

h(q)h* (r) E[x{t -q)x*{r-t- r)\ dr dq 
h(q)h* (r) a x (t - (q - r)) dr dq 
h(q)h*(q- p)dq) a x (t - p) dp 
a h (p) a x (t - p) dp = a y (t), (3.121c) 



-oo J — OO 
OO / />oo 



— oo W — OO 
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Similarly, we compute the crosscorrelation between the input and the output: 
c x , y (r,t) = E[x{t) y* (r - £)] 

( =' E[x(t) f h* (r)x* (r - t - r) dr] 



J — OO 
•OO 



= E[ h* (r) x{t)x* (t - (t + r)) dr] 

J — OO 

® j h*{r)E[x{T)x*{T-{t + r))]dr 

J — OO 

= I h*{r)a x (t + r)dr, (3.121d) 

J — OO 

where (a) follows from ( 1 3 . 3 6 [ ) : (b) from linearity of expectation; and (c) from the 
expression for the autocorrelation of the WSS x, ( J3.118I ). We will use the above 
expressions shortly to make some important observations in the Fourier domain. 

3.6.3 Fourier Transform 

As for deterministic functions, we can use Fourier techniques to gain insight into 
the behavior of continuous stochastic processes and systems. While we cannot take 
a Fourier transform of a continuous stochastic process, as it is neither absolutely, 
nor square integrable, we make assessments based on averages (moments). We can 
take the Fourier transform of the autocorrelation, and we do this right now. Let 
us assume a WSS x and find the Fourier transform of its autocorrelation ( 13.118) 
(which we assume to have sufficient decay so as to be absolutely, or, at least square 
integrable): 

A x (ui) = / a x {t)e~ 3ult dt, (3.122) 



which is called the power spectral density. The power spectral density exists if and 
only if x is WSS, the result of the Wiener-Khinchin theorem. When x is real, the 
power spectral density is nonnegative, and thus admits a spectral factorization 

A x {lo) = U(lu)U*(lo), 

where U(u) is its nonunique spectral root. 

Given (3.121c) , the autocorrelation of the output can be expressed as 

Ay(u) = A h (u)A x (uo) = \H{u)\ 2 A x {u), (3.123) 

where Ah(uS) = \H{ui)\ 2 is the Fourier transform of the deterministic autocorrelation 
of h, according to Table 13.21 The quantity 

E[y 2 (t)} = % (0) = ^[\H(u>)\ 2 A x (u>)du>, 



is the average output power. Similarly to (3.123) , and from ( ]3.121d) , we can express 
the crosscorrelation between the input and the output as 

C XtV (w) = H*(uj)A x (lu). (3.124) 
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White Noise Using ( 13. 1201 ) and Table 13.21 we see that the power spectral density 
of white noise is a constant: 

A(u) = a 2 x , (3.125) 

and has infinite power since its autocorrelation is a Dirac delta function. 
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Chapter at a Glance 

By now, we have seen all the versions of the Fourier transform and series that will be used 
in the sequel, summarized in Table 13.5 and Figure 3,111 These variants of the Fourier 
transform differ depending on the underlying space of sequences or functions, and can be 
continuous infinite (Fourier transform), discrete infinite (Fourier series), continuous finite 
and circularly extended (DTFT), and discrete finite and circularly extended (DFT). 
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(a) Fourier transform. (b) Fourier series. (c) DTFT. (d) DFT. 

Figure 3.11: Various forms of Fourier transforms seen in this chapter and Chapter [21 



Transform Forward/Inverse 



Duality 



Fourier 
transform 

Fourier 



x(t) 

X k 

x(t) 



/oo 
x(t) e'^ 1 dt 
- OO 

/CO 
X{w) e Jt "' duj 
-oo 



(1/T) f ' x{t)e- j{ - 2w l T ^ ht dt Dual with DTFT 
J -Til 

x(t + T) = x(t) 



T /I 
j(2ir/T)fct 



Discrete-time X(e- ,CJ ) 

Fourier transform x n 



Discrete 



A' A 



Fourier transform x„ 



J2x k e 



J2 x n e~ j " n 

(l/2vr) f W X(e j ")e j " n duj 

J — 7T 

N-l 

E- 

n = 

JV-1 

(1/N) J2 X k e 
k =o 



Dual with Fourier series 



-j(air/N)kn 



j(lT,/N) kn 



Table 3.5: Various forms of Fourier transforms seen in this chapter and Chapter [21 

In both this chapter as well as the previous one, box and sine functions played a prominent 
role; for easy reference, they are summarized in the following table. 
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Unit-Norm Box and Sine Functions/Sequences 



Functions on the real line 

x{t), t e R, ||a;|| = l 

l/V*o, |*| < to/2; 
0, otherwise, 



Box 



FT 

x(u>), uei, ||x|| 

V^o'sinc (ujto/2) 



Sine 



sinc(ujot/2) 



^/27r/u>o, \u\ < oj /2; 
0, otherwise. 



Periodic functions 

x(t), t 6 [-T/2, T/2), 

VVio. 1*1 < *o/2; 
0, otherwise, 

' ko sinc(7r/cot/T) 



FS 

X k , fce. 



11*11 



l/VT 



Box 



Sine 

T sinc(vrt/T) 

Infinite-length sequences 

£„, n £ Z, ||x|| = 1 

1/V^O, M < («o - l)/2; 



sine (irkto/T) 



l/Vk~o~T, \k\ < (k - l)/2; 
0, otherwise. 



Box 



Sine 



0. 



otherwise, 



DTFT 

X(e? w ), u> e [-it, 7r), 

. sine (ojno/2) 
sine (lo/2) 



\\X\ 



yphr 



sinc(ojon/2) 



tJ2tt/u} , \u\ < LO /2\ 
0, otherwise. 



Finite-length sequences 

x n , n e {0, 1,...,JV - 1}, ||ic||=] 

1/VnrJ, |n - N/2\ > (n - l)/2; 
0, otherwise, 



Box 



DFT 

X h , k e {0,1,..., AT- 

. sine (irnok/N) 



!}■ 



II A' II 



V7V 



/fco sine (irnko/N) 

Sine W ■ — — 

V TV sine (im/N) 



sine (nk/N) 
•JWJhu \k - N/2\ > (fco - l)/2; 



0. 



otherwise. 



Table 3.6: Unit-norm box and sine functions/sequences seen in this chapter and Chap- 
ter [2} The box function/sequence is sometimes termed Dirichlet; thus, the box is the 
Dirichlet in time and the sine is the Dirichlet in frequency. Moreover, the box func- 
tion/sequence is usually used as a rectangular window, while the sine function/sequence 
is the impulse response of an ideal lowpass filter. For the DTFT, luq — 2-k/N leads to an 
ideal YVth-band lowpass filter, while loq = n leads to an ideal halfband lowpass filter. In 
the DFT, the inequalities appear reversed to account for the circularity of the DFT. 



Historical Remarks 

A giant featured in the title of this chapter, Jean Baptiste Joseph Fourier (1768- 

1830), was a French mathematician and physicist, who proposed his famous Fourier series 
while working on the equations for heat flow. His interests were varied, his biography 
unusual. He followed Napoleon to Egypt and spent a few years in Cairo, even contributing 
a few papers to the Egyptian Institute Napoleon founded. He served as a permanent 
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secretary of the French Academy of Science. He published "Theorie analytique de la 
chaleur" , in which he claimed that any function can be decomposed into a sum of sines 
and cosines; while we know that is only partially true, it took mathematicians a long time 
to tighten the result. Lagrange and Dirichlet both worked on it, with Dirichlet finally 
formulating conditions under which Fourier series exists. 
Josiah Willard Gibbs (1839-1903) was an American math- 
ematician, physicist and chemist. Known for many significant 
contributions (among others as the inventor of vector analysis, 
independently of Heaviside), he was also the one to remark upon 
the unusual way the Fourier series behaves at a discontinuity; 
the Fourier series overshoots significantly, though in a controlled 
manner. Moreover, we can get into trouble by trying to differen- 
tiate the Fourier series. Karl Theodor Wilhelm Weierstrass 
(1815-1897) gave a famous example of a continuous function 
not differentiable anywhere. In 1926, Andrey Nikolaevich 
Kolmogorov (1903-1987) proved that there existed a func- 
tion, periodic and locally Lebesgue integrable, with a divergent Fourier series at all points. 
While this seemed as another strike against Fourier series, Lennart Carleson (1928), 
a Swedish mathematician, showed in 1966 that every periodic, locally square-integrable 
function has a Fourier series that converges almost everywhere. 




Further Reading 

Books and Textbooks We now give a sample list of books/textbooks in which more 
information can be found about various topics we discussed in this chapter. Some of them 
are standard in the signal processing community and others we have used while writing 
this book. 

More on the Dirac delta function can be found in [109] . Mallat wrote a book on 
wavelets and signal processing, similar in outlook to this one, but aimed at applied math- 
ematicians, which has a fair amount of material on harmonic analysis [100]. The book by 
Bremaud [19] is a clean, self-contained text aimed at the signal processing researcher. The 
text by Bracewell [17] is a classic; the material is written with engineering in mind, with 
plenty of intuition. The book by Papoulis [112] is another classic engineering text which 
has been used for quite some time. The material on stochastics can be found in the book 
by Porat (113]. Finally, the text by Folland [54] has been written from a physicist's point 
of view, and offers an excellent treatment of partial differential equations. 



Exercises with Solutions 

3.1. Derivative of the Dirac Delta Function 

Given is a continuously differentiable function x(t). Show that 






x(t)6'(t)dt 



-x'{0). 



where <5'(£) is the derivative of the Dirac delta function. 
Solution: Using integration by parts, 



/ x(t)5'(t)dt = x(t)S(t) \°? x - / x'(t)5(t)dt ( = -/ x'(t)5(t)dt = -x'(0), 
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where (a) follows from x(t)S(t) being zero at ±00; and (b) from Table 137X1 

3.2. Properties of the Convolution 

(i) Prove the associativity property (|3.37c[ ). 
(ii) Derive a counterexample to associativity for some function not in C . 

(Hint: Use a function such as a Heaviside function.) 
(iii) Prove the deterministic autocorrelation property (|3.37d| ). 

Solution: TBD 

3.3. Fourier Transform of Functions in C 1 (R) 

Given is x(t) £ £ 1 (IR). Show the following properties of its Fourier transform X(ui): 

(i) X(uj) is bounded, 
(ii) X(w) is continuous. 

(Hint: Use the dominated convergence theorem, (1.197) , on (e _JtJ * — e~3 fit )x(t).) 
Solution: 

(i) To show that X(lo) is bounded, we write: 



\X(u>)\ 



' I dt < 00 , 



/oo (a) roc rao 

x(t)e- iut dt < / \x(t)e~ ju,t \dt = / \x(t)\, 
-00 J — 00 J — 00 



where (a) follows from Section |l. A. 41 and (b) from x(t) £ /Z 1 (IR). 
(ii) To show that X(ui) is continuous, consider (e -3 "' — e~i nt )x(t). Now, 

\(e-' at -ei nt )x(t)\ < \(e- jut -ei nt )\\x(t)\ = 2\x(t)\. 

We have thus a positive dominating function that is integrable, and we can use the 
dominated convergence theorem (see Appendix 1.A.3) to allow us to replace the 
limit of integrals by the integral of the limit, that is, 

roc 

lim J(oj) - X(S2) = lim / (e'^t -e- int )x(t)dt 



r 



lim (e~ juJt - e~ jnt )x(t)dt = 0, 



proving continuity of X(uj). 



3.4. Relation of the Fourier Series to the Fourier Transform 

Verify A3. 98) , that is, that the Fourier series coefficients of x(t) are samples of the Fourier 
transform of the same function restricted to the interval I = [— T/2, T/2). 
Solution: The Fourier series coefficients of a periodic x(t) are given by p. 90a) : 



1 r T / 2 

X k = - / x(t)e- 3{2lr ^ T ^ kt dt. (E3.4-1) 

T J -T/2 



On the other hand, given a restriction of x(t) to the interval / = [—T/2, T/2) (xj is equal 
to x(t) on [—T/2, T/2) and is otherwise), its Fourier transform is 



r r T / 2 

X 7 (u) = / x I (t)e~^ t dt = / x(t)e- lult dt. 

iteR J-T/2 



fT/2 
f/2 

Comparing (E3.4-2) and (1E3.4-1) , we see that 



(E3.4-2) 



** = \ X l (J*) ■ ( E 3-4-3) 

3.5. Convolution and Sum of Continuous Random Variables 

Let x and y be independent continuous random variables with PDFs f x and f y . Show that 
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z = x + y has PDF / x * f y . 
Solution: For any t £l, 



<a) r°° r~ u 

?z(t) = P(z<i) = P(x + y<<) = / / f x , y {s,u)dsdu 

J — 00 J — 00 

= r f u Ms)f y ( U )dsdu ( = } r f y {u)(f u Ms)ds) 

( => /°° / y («)F x (t-«) 
j — 00 



1 du 

where (a) follows from expressing the region {(s,u) \ s + u < t}; (b) from the indepen- 
dence of x and y; (c) from f y (y) not depending on s; and (d) from the definition of _F X . 
Differentiating with respect to t gives 

U(t) = f fy(u)Mt-u)du, 
J — OO 

showing that fz = fx*fy 



Exercises 

3.1. Properties of the Dirac Delta Function 
Recall the sequence d n from ( | 3 ■ 6 1) ■ 

(i) Prove that di, Ch, .. . does not converge in C 2 norm. 

(Hint: Recall from Definition 1,13 that convergence in a vector space depends on 
the norm. It is adequate to show that di, da, ... la not a Cauchy sequence under 
the C 2 norm.) 

(ii) Prove that if x(t) is continuous at to, then 

lim / x(t)d n (t — to) dt = x(to) 

•* — OO 

(iii) Prove that if x(t) is continuous at to, then 

/OO /'OO 

y(t)x(t)d n (t-t )dt = lim / y(t)x(t )d n (t - t ) dt 
-oo ™^o°J_oo 

for any function y(t). 
(iv) Suppose x(t) is a continuously differentiable function. Show that 



r 



x(t)S'(t)dt = -cc'(O) 



follows from the properties of the Dirac delta function and the assumption that 
integration by parts is valid for expressions involving 5(t) and its derivative. Find 
similar expressions for higher-order derivatives of &(t). 

3.2. BIBO Stability 

Given is an LSI differential equation with impulse response h(t). The output is given by 
the convolution integral (13.361) 



y(t) = / x(r) h(t - t) dr. 
Prove that h(t) £ £ 1 (R) is a necessary and sufficient condition for BIBO stability. 



3.3. Derivative of a Function 

Given is a function x(t) and its Fourier transform X(uj). Give a sufficient condition on 
X(u) for the derivative dx(t)/dt to be bounded. 
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3.4. Generalized Parseval's Equality 

Prove that if x(t), y(t) are both in C 1 n C 2 , then 

(x, y) t = ±.(X, Y) u , 

Z7T 

where X(ui), Y(u>), are their Fourier transforms, respectively. 

(Hint: Express the inner product as a convolution of x(t) and y* (— i) evaluated at the 

origin, and use the convolution theorem.) 

3.5. Poisson Sum Formula 
Prove (375) , 

(Hint: Since ,S t (lj) is 2tt/T— periodic, show that on one period, St(ci>) = (2tt/T)5(uj). 
Since this is a distribution, show that for any continuous test function X(ui) with support 
in [-tt/T.tt/T), the following holds: 

AT 



T 



lim 



N 



f Yl e~ J " nT XHdw 



2tt 



X(0).) 



3.6. Application of Poisson Sum Formula 

Given is the function x(t) whose spectrum is given by X(lu) = 1/(1 + M)" for some 
positive real number a. The spectrum along with its replicas are shown for a = 1.5 in 
Figure P3.6-H Determine a condition on a such that 5TJneZ a: ( n ) converges. 




(i+M) 1 - 5 



Figure P3.6-1: (a) Continuous spectrum of the function x(t). (b) Spectrum of discrete 
function x n — x(n). The replicas of the spectrum are present. In case of a slowly decaying 
function the discrete spectrum can become infinite. 



3.7. Fourier transform of a Gaussian Function 

Prove that ( [3.78) forms a Fourier transform pair. 

(Hint: See the discussion following (13.78) . You can use the fact that 



1 



dt 



VtF. 



to specify the constant.) 



3.8. Function Decay 

Consider a signal a(t) having the Fourier transform A(u) given in the Fig. IP3.8-H 

(i) Write the expression for A(lo), and show that one possible solution with real values 
for the equation |<£>(ix>)| 2 = A(uj) is given by 

0. M > 2vr 

*^ = < r-g, M<2. ■ 



y/ 
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Figure P3.8-1: Fourier transform A(u). 



(ii) If ip(t) is the time domain version of &(uj), show that 

(ip(t),ip(t - n)> = <5„. 
(Hint: Make the analysis in the frequency domain; use Poisson's sum formula.) 
(iii) Consider the box function in frequency donrain: 



B(u>) 



0, M > 7T 

1, M < 7T 



Show that A(uj) = w-B(uj) * B(uj) (convolution product). What can be said about 
the rate of decay of a(i)l 

3.9. Decay of Fourier Transform and Smoothness 



Prove that if X(ui) decays faster than 1/|oj| p+1 for large |u>| as in ( [3.79b) . then x(t) £ C p . 
(Hint: Use (13, 79a) ) and the differentiation property of the Fourier transform.) 



3.10. Lipschitz Regularity 

Assume that X(uj) satisfies (13.821) 



L 



|X(w)|(l + \w\ a )dw < oo. 



(i) Show that x(t) is bounded, 
(ii) Show that x(t) is Lipschitz a, that is, show (13.81) ): 

\x(t)-x(t )\ 
< c, 

\t-t \ a 

for any t and to an d < a < 1. 

(Hint: Express the above ratio in terms of the inverse Fourier transform, and divide 
the integral into two parts: \t — iol -1 ^ \ UJ \ an <l I* — *ol _1 > 1^1-) 
(iii) Show how to extend the above characterization for r = n + a, n S Z + . 

3.11. Real Fourier Series 

Given is a 2-7r-periodic, real- valued function x(t) with Fourier series coefficients X^. Show 
that x(i) can be expressed as a real Fourier series 

oo 

x(t) = ao + 2_^( a k cos(kt) + bj. sin(fct)), 



where 



1 [* 

ao = — / x(t)dt, 
2-rr J_ 7I 

1 /' 7r 
dk — ~ I x (t) cos(kt) dt 

1" J-w 

1 f 7 ' 

bfe = — / x(t) sin(fci) dt. 
i" J-w 



k e Z+, 

k e z+, 
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and indicate the relationship between {a k ,b k } and {X k }. 

3.12. Completeness of Fourier Series 

As part of Theorem |3.15l prove (13-951) for continuous, real-valued C 2 functions on [— T/2, T/2). 

We do this in steps, introducing a deterministic autocorrelation function as a helper 
function. We start with x(t) on [—T/2, T/2), assuming it is continuous and square inte- 
grable. 

(i) Introduce xx(t) as a periodized version of x(t) as in A3. 72) ). Show that 

(■T/2 

a(t) = I x T (t)x T (t + r)dt, 

J -T/2 

is T— periodic and continuous, 
(ii) Prove that (13.109) is indeed a Fourier-transform pair, 
(iii) From the orthogonal projection theorem, we can write 

/T/2 N rT/2 

\x(t)\ 2 dt= V \X k \ 2 + \x(t)-x N (t)\ 2 dt, (P3.12-1) 

- T / 2 k=-N J - T / 2 

with £jv(t) as in (13. 94) . It suffices to show that, as N — > oo, 

f ' 2 \x{t)\ 2 dt = V|X fe | 2 (P3.12-2) 

J - T / 2 feez 

to prove ( |3.95| ) and therefore completeness. 

From (]P3.12-ll , verify that JHfeez l^"fe| 2 < °°> an d since a(t) is continuous, we 
can use Theorem 13.171 to prove that 

<t) = £ \X k \ 2 e^^\ 
fc= — oo 

for all t. 
(iv) For the particular value t = 0, show ( |3.95| |. 

3.13. Uniqueness of Fourier Series 

Consider T-periodic functions x\(t) and X2(t), each with an absolutely-integrable period. 

(i) Show that if Xi j. = X 2t k fo r au k, then they are equal almost everywhere, 
(ii) Show that if they are continuous, then they are equal everywhere. 

3.14. Integration of Fourier Series 

Given is a 27r-periodic, real-valued, zero-mean function x(t) with real Fourier series as in 
Exercise 13.111 Consider the primitive of x(t), with period 

X(t) = I x(r)dr, t 6 [— 7r,7r). 

Jo 

Show that the real Fourier series of X{i) is 

OO / , x 

X{t) = A + Y^, ( — sin(fct) - — cos(fct) J , 
where Aq = ^2 k x L 1 b k /k is finite, and {a k , b k } are the real Fourier series coefficients of x(t). 

3.15. Convolution Rule for Fourier Series 

Given is a T— periodic signal x(t) and a stable filter h(t). Prove that the following is a 
Fourier series pair: 



y(t) = f h(r)x(t-r)dt *^ Y k = H f^-k) X h , 
where y(t) is the T-periodic output of the convolution. 
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3.16. Fourier Series of a Triangle Wave 

In ( [3.115) , the Fourier series of the triangle wave was computed using the integration 
property. Obtain the same result using the following alternatives: 



(i) Direct computation using the definition of the Fourier series fl3.90a[ ), 
(ii) Using the Fourier transform of one period of the triangle (|3.49f[ ) and sampling ( |3.98[ ). 
(iii) Using the convolution property. 

(Hint: Use a square wave and a filter that is the indicator function of [0, 1/2].) 

3.17. Sawtooth Function and Gibbs Phenomenon 

Given is the sawtooth function of period T = 1, with one period given by 

C 1/2 -t, -l/2<t<0; 
x(t) = I 0, t = 0; (P3.17-1) 

{ -1/2- t, < t < 1/2. 

Compute the Fourier series coefficients of x(t) and comment on possible Gibbs phenomenon. 
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The previous two chapters dealt with discrete-time sequences indexed by in- 
tegers and continuous-time functions of a real variable. The primary purpose of the 
present chapter is to link these two worlds. This is done through sampling, which 
produces a sequence from a function, and interpolation, which produces a function 
from a sequence. The ability to sample a function, manipulate the resulting se- 
quence with a discrete-time system, and then interpolate to produce a function, is 
the foundation of digital signal processing. Conversely, the ability to interpolate a 
sequence to create a function, manipulate the resulting function with a continuous- 
time system, and then sample to produce a sequence, is the foundation of digital 
communications. These interactions conceptually position the chapter as a bridge 
between Chapters \2\ and \3\ as illustrated in Figure 14.11 

Given a continuous-time function, one can associate a sequence by simply 
taking samples (evaluating or measuring the function) uniformly in time. Classical 
sampling theory places a bandwidth restriction on the function so that the samples 
are a faithful representation of the function leading to the concept of Nyquist rate, 
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x(t) 




Vn 




Vn 




x(t) 


Sampling 


DT processing 


Interpolation 













(a) Digital signal processing. 



Vn 




x(t) 




x(t) 




Vn 


Interpolation 


CT processing 


Sampling 













(b) Digital communications. 

Figure 4.1: Sampling and interpolation in signal processing and communications, (a) 
Digital signal processing: sampling produces a discrete-time sequence from a continuous- 
time function, which is then processed with a discrete-time system and interpolated, 
(b) Digital communications: interpolation produces a continuous-time function from a 
discrete-time sequence, which is then processed with a continuous-time system and then 
sampled. 



a minimum rate of sampling so that changes in the function are captured. We 
will develop this result in detail, but will also see it as a special case of a more 
general theory involving shift-invariant subspaces. Our approach is intimately tied 
to basis expansions and subspaces: sampling followed by interpolation projects to 
a subspace, while interpolation alone embeds information within a subspace in a 
higher-dimensional space. 



4.1 Introduction 

In Chapters \2\ and \3\ we saw a pair of bijections between sequences and functions: 

DTFT 



discrete-time sequence 
periodic continuous-time function 



I'S 



periodic Fourier-domain function, 
discrete Fourier-domain sequence. 



The first associates a periodic function, the discrete-time Fourier transform X(e JLU ), 
to a discrete-time sequence x n , and the second associates a sequence, the Fourier 
series X/., to a periodic continuous-time function x(t). 

While these are important connections, they are different in spirit from sam- 
pling and interpolation, which both operate within the time domain. Even when 
sampling and interpolation do not only operate between discrete and continuous 
domains, that instance is the most common one: 



discrete-time sequence 



interpolation 
sampling 



continuous-time function. 



In this section, we capture the main themes of the chapter by running through one 
representative example that includes going from continuous time to discrete time 
and back. This expands upon Examples 1.14 li) and l.l^(iii) . 
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Figure 4.2: Piecewise-constant functions over unit intervals. 



Subspace of Functions Consider the space S of functions that are piecewise con- 
stant over unit intervals as in Figure 14.21 

x(t) = x(n) for all t € [n, n+ 1), n e Z. (4.1) 

The space S is shift-invariant with respect to integer shifts, that is, for x(t) € S 
and any k G Z, x(t — k) also belongs to S. Because of ( 14.1) , functions in S are 
in one-to-one correspondence with sequences. If (p(t) = X[o,i)W is the indicator 
function of the unit interval, the set {<p(t — k)}kez is an orthonormal basis for S. 

Instead of unit-length interval, we could consider any interval of length T > 0. 
This time, the set {(1/vT) <p(t/T — k)}k£Z is an orthonormal basis for S, the space 
of piecewise-constant functions over intervals [kT, (k + 1)T). The interval length 
T is called the sampling period and 1/T is called the sampling rate. Changing the 
sampling period T allows us to adjust the sampling rate to the function at hand 
(sample more often for a fast-varying function or sample less often for a slowly- 
varying function). 



Sampling Suppose we want to measure a continuous-time function x(t) to obtain 
a sequence of real numbers y n describing x(t). While the device we employ might 
be able to take a measurement of x(t) at a single point in time, this measurement 
would be sensitive to noise. Instead, it is more robust to measure an integral; for 
all n G Z, 

x(t) alt = x(t) tp(t — n) dt 

J — 00 

= <Pi-t)*x(t)\ t=n = &(t-n),x(t)) t ( => (<Fz)„, (4.2) 

where (a) follows from the definition of the indicator function X[o.i)M = < / 3 (^); an d 
in (b) $* denotes the sampling operator illustrated in Figure RL3l (a) with T = 1. 
In other words, this sampling operator is implemented using filtering by p(—t) 
and sampling at integer instants. Since square integrability of x(t) implies square 
summability of y n , $* is a mapping from £ 2 (M) to £ 2 (!<) (see Example 11.1 4(i)j) . 
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(a) Sampling 



(b) Interpolation. 



Figure 4.3: Sampling and interpolation for piecewise-constant functions, (a) Sam- 
pling is filtering by <p{—t) followed by sampling at t — n. (b) Interpolation is pointwise 
multiplication by the Dirac delta comb function si(i) followed by filtering by <p(t). 



Interpolation Interpolation generates a function x(t) from a sequence y n . One 
way to do this is to form 



x(t) = J2y n <p(t-n) = ($y)(t), 



(4.3) 



where $ denotes the interpolation operator illustrated in Figure RT3T b) with T = 1. 
In other words, the sequence y n is multiplied pointwise by the Dirac delta comb 
function Si(i) from ( 13. 7) , before being filtered by tp(t). From Examples 1.14 Ii) 



and l.la( iii) , we know that this operator is the adjoint of the sampling one; thus 
our choice to call it $. Since x(t) is constant on intervals [n,n + 1), n G Z, a 
calculation similar to Example |1.14j(i)| shows that square summability of y n implies 
square integrability of x(t); thus, $ is a mapping from £ 2 (Z) to £ 2 (R). 



Interpolation Followed by Sampling: Sequence Recovery Figure 14.4( a) with 
T = 1 depicts interpolation followed by sampling. Because of the specific choice of 
sampling and interpolation operators, we have that $*$ = /; in other words, any 
sequence y n € £ 2 (Z) is recovered perfectly when the interpolated function computed 
through ( 14.3) is used in the sampling formula ( 14.2) . 

®*®y n = y n , y n el 2 CL). 

Another choice for sampling or interpolation would not have necessarily lead 
to perfect recovery. Suppose that the interpolation operator had been different, for 
example, linear interpolation instead of constant one, that is tp(t) had been |t| on 
the unit interval and otherwise; then, $*$ ^ I. We will discuss such instances 
later in the chapter. 

Sampling Followed by Interpolation: Function Recovery Figure RL4T b) with T = 
1 depicts sampling followed by interpolation. Because of the specific choice of 
sampling and interpolation operators as well as the fact that the function x(t) 
belongs to S, the function is recovered perfectly when the samples computed through 
( 14.2) are used in the interpolation formula ( ]4.3j h 

$$*x(t) = x(t), x(t) eSc C 2 (R). 
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(a) Interpolation followed by sampling. 




(b) Sampling followed by interpolation. 

Figure 4.4: Sampling and interpolation for piecewise-constant functions, (a) Interpola- 
tion as in ( 14.31) followed by sampling as in ( 14.21) . $*$, recovers the input sequence perfectly 
for any y n £ i 2 (Z), that is, y n — y n . (b) Sampling as in (14.21) followed by interpolation as 
in ( ]4.3D , <&$*, recovers the input function perfectly when x(t) S S, that is, x(t) = x(t). 



10 


x(t), x(t) 














r*T t 


-5 
-10 








YJ 5 "*" 10 15 










(a) 

Figure 4.5: Least-squares approximation of an arbitrary function x(t) by a piecewise- 
constant approximation x(t) £ S*. (a) Example function and its piecewise-constant ap- 
proximation, (b) Conceptual depiction of orthogonal projection. 



Unlike for interpolation followed by sampling, sampling followed by interpo- 
lation poses restrictions on the input function to guarantee perfect recovery; here, 
the input function must belong to the subspace S € £ 2 (R). Both the sequence 
recovery property and the function recovery property depend on having a proper 
match between sampling and interpolation operators as we will discuss later in the 
chapter. 

When x(t) qL S, sampling followed by interpolation, P = $$*, does not act 
as the identity on x(t), and we only obtain an approximation. Finding the closest 
function in S (where distance is measured with the C 2 norm) is simple because of 
Hilbert-space geometry. We find that 



P 2 
P* 



$$*$$* w $$* 



($$*)* 



$$* 



P- 



where (a) follows from $*$ = I, In other words, P is idempotent and self-adjoint, 
that is, P is an orthogonal projection operator. The projection theorem, Theo- 
rem EL2S1 then states that given an arbitrary x(t) G £ 2 (M), sampling followed by 
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Figure 4.6: Best least-squares approximation of x = (1/2) [3 5] T via the orthogonal 
projection operator (14.51) in R 2 . 



interpolation results in x(t) = <&<fr*x(t), the best least-squares approximation of 
x(t) in S (see Figure [4~5] ). 

Another way to verify the best least-squares approximation property is to 
note that the set {tp(t — k)}kez is an orthonormal basis for S and that ( 14.30 is an 
orthonormal basis expansion formula. Thus, the approximation property follows 
from Theorem 1.401 Finally, one can explicitly verify that computing the average 
of a function over an interval minimizes the C 2 norm of the error of a piecewise- 
constant approximation to the function (see Exercise 14. lj ) . 

When the interval length T / 1, and for any fixed T > 0, there are func- 
tions x(t) £ £ 2 (R) that differ appreciably from the closest function xx(t) that is 
piecewise constant over intervals [kT, (k + 1)T). However, considering all T > 0, 
these piecewise-constant functions are dense in £ 2 (K), and the approximation er- 
ror between Xr(t) and x(t) goes to zero as T — > 0. The rate at which this error 
goes to zero is an important parameter; it indicates the approximation power of the 
sampling and interpolation scheme (see Solved Exercise 14.1] ) . 



Sampling and Interpolation with Nonorthogonal Vectors In the discussion so 
far, we have seen that the sampling and interpolation operators are adjoints of 
each other, making P = $$* an orthogonal projection operator. This does not 
have to be the case, and throughout this chapter we will see instances of general 
sampling and interpolation operators, $* and $, respectively. To get a feel for 
these, we will look into the simplest setting, vectors in K 2 . To be able to compare 
orthogonal to nonorthogonal case, we start first with orthogonal and then follow 
with nonorthogonal one. 



(i) Consider first the sampling and interpolation operators to be: 

<r = -J. [i i] , $ 



i 

v/2 



(4.4a) 



The null space of the sampling operator and its orthogonal complement, which 
is the same as the range of the interpolation operator, are 



N{$>*) 



S 



n{$) 



a e 



(4.4b) 
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Figure 4.7: Approximation of x — (1/2) [3 5] T via the projection operator ( 14.71) 

:~ ttd2 



Clearly, interpolation followed by sampling leads to identity, 

$*$ = 1. 

What is more interesting is that sampling followed by interpolation, P = $$*, 
is an orthogonal projection operator, 



1 1 
1 1 



P z 



P, 



p* 



P. 



(4.5) 



Then, for any given x G R 2 , x = Px is the best least-squares approximation 
of x onto S; when x € S, then x = x. Figure 14.61 illustrates these spaces as 
well as the approximation for x = (1/2) [3 5] . 
(ii) We keep the interpolation operator from ( |4.4a[ ) the same and choose the sam- 
pling operator to be 

** = 5$3 [1 3] . (4.6a) 

The null space of the sampling operator and its orthogonal complement (not 
anymore the same as the range of the interpolation operator), are 



AA($*) 



a £ 



(4.6b) 



Again, interpolation followed by sampling leads to identity, 

$*$ = 1. 

However, sampling followed by interpolation, P = $$*, is not an orthogonal 
projection operator anymore, albeit it is still a projection operator, 



P 



1 3 
1 3 



P 2 



P, 



P* ^ P. 



(4.7) 



When x € S, then x = x. Figure 14.71 illustrates these spaces as well as the 
approximation for x = (1/2) [3 5] . 
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Figure 4.8: Operator 1)4.9} that is not a projection applied toi= (1/2) [3 5] T . 
Triangles denote P 2 x, P 3 x, P 4 x. 



(iii) We still keep the interpolation operator from ( ]4.4a[ ) the same and choose the 
sampling operator to be 

$* = ^ [1 2] . (4.8a) 

The null space of the sampling operator and its orthogonal complement are 



7V(S* 



a £ 



(4.8b) 



This time, interpolation followed by sampling does not lead to identity, 

$*$ ^ 1. 

Sampling followed by interpolation, P = $$*, is not even a projection opera- 
tor anymore, 



1 2 
1 2 



P 2 - JL 

r ~ 16 



1 2 
1 2 



^ P, P* + P. 



(4.9) 



Since this is not a projection operator, it goes along S away from Px if applied 
again. In general, 



l (3 

4 



(!) 



fe-i 



1 2 
1 2 



meaning that eventually, the point will be mapped to 0. Figure RL81 illustrates 
this effect 77 ! as well as the spaces involved for a; = (1/2) [3 5] . 

Chapter Outline 

The chapter follows this brief introduction for different spaces: Section 14.2 devel- 
ops sampling theory from the perspective of linear operators (matrices) in finite- 
dimensional spaces; we do that for orthonormal sets of vectors first, followed by 



77 Here, the points move along S closer to the origin because the operator has eigenvalues smaller 
than 1 in absolute value. Choosing a different scaling in ( |4.8a| ), for example, 1/2 instead of l/(2v / 2), 
would make the points along S to infinity. Note that a scaling of l/v5 nrakes it idempotent. 
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nonorthogonal ones. Sections 14.31 and 14.41 do the same for sequences and functions, 
respectively. In these settings, the use of LSI filtering naturally leads to the sam- 
pling theory for shift-invariant subspaces, as well as a special case of bandlimited 
sequences and functions. The celebrated sampling theorem for bandlimited func- 
tions is presented both with a classical justification and as an orthonormal expansion 
of the subspace of bandlimited functions of a given bandwidth. We also consider 
multichannel sampling. Section 14.61 considers the stochastic setting, and finally, 
Section 14.71 concludes with the discussion of computational aspects. 

Notation used in this chapter: Starting with Section 14.31 we use tp and g in 
parallel to denote sequences/functions from the points of view of expansions in 
bases/projections onto subspaces as well as signal processing using filtering. We do 
that so that we show connections between these points of view. D 



4.2 Finite-Dimensional Vectors 

We now look into the interpretation of sampling and interpolation when the larger 
space is the space of finite-dimensional vectors, C A/ . Subspaces of this larger space 
are finite-dimensional spaces C , with N < M (and often N <C M). Then sampling 
will take M values and produce N < M values, while interpolation will take N 
values and produce M > N values. 

4.2.1 Sampling and Interpolation with Orthonormal Vectors 

Sampling Sampling is a linear operator from C to C , so it can be represented 
by an N x M matrix $*. For a given input vector x G C M , the sampling output is 



vector y G 



iJV 



(x, tpo) 
(x, <fl) 

(x, tpN-1 



Nxl 



V*Q 




Xq 


<fl 




X\ 


JPn-i. 


NxM 


_X M -l_ 



$*x. 



(4.10) 



Mxl 



In the above matrix, ip* k is the fcth row of $*; we assume ip* k to be orthonormal, 

(lf n , ip k ) = S n ^ k «• $*$ = I, (4.11) 

and thus, $* has maximum rank TV. Then, the sampling operator has an (M — N)- 
dimensional null space, Af(&*); the set {^fc} fe=0 spans its orthogonal complement, 
S = Af(^*) ± = span({(/?fc} fc=0 ). In other words, when a vector x G C is sampled, 
the component that remains is in S and is captured by $*z; the component that is 
lost due to sampling is in the null space S . We illustrate this with an example. 

Example 4.1 (Sampling in K 4 ) Let us define sampling of x e M 4 to obtain 
three samples y G K 3 as in Figure RL9l (a) , where solid lines have weight 1/2, while 
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(a) Sampling. 



(b) Interpolation. 



Figure 4.9: Sampling and interpolation in R 4 with orthonormal vectors. Solid lines 
have weight 1/2, while dashed lines have weight —1/2. 



dashed lines have weight —1/2; for example, j/i = (xq — x\ + xi — x^)/2. Then, 
the sampling operator can be written as 



<I>* 



1111 
1-1 1-1 
1 1-1-1 



(4.12a) 



With a G K, ctfc G K, the null space of <j>* and its orthogonal complement are 



A/"($*) = < 



s 


i" 


■ 


a 


-i 
-i 


> 


k 


i 


, 



5 = < 



' 


1" 




i" 




1 


a 


i 
i 


+ «i 


-i 

i 


+ Oi 2 


1 
-1 


k 


i 




-i 




-1 



(4.12b) 



Applying the sampling operator to an arbitrary vector in 



<I>* 



"2" 




( 


~l 




l" 


\ 




T 




1" 




"2" 








= $* 




1 
1 


+ 


-1 
-1 




= $* 


1 
1 


+ $* 


-1 
-1 


= 


2 






1 




1 






1 




1 


























S 




\ es es^/ 






=0 





Thus, one component of the vector, the one belonging to the null space S- 1 , was 
mapped into and lost. 

Interpolation Interpolation is a linear operator from C N to C M , N < M, so it 
can be represented by an M x N matrix; we choose that matrix to be the adjoint 
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of the sampling operator, $, as we have done in the introductory example. For a 
given input vector y G C w , the interpolation output is a vector x G C M , 



[fo ¥>i 



W-i\mxn 



2/1 
Vn-i 



N-l 



®y = ^Vkfk- (4.13) 



fc=0 



jvxi 



In the above matrix, tpif is the fcth column of $. As was true for $*, $ has maximum 
rank N. Thus, the interpolation operator has an ./V-dimensional range S, which is 
a proper subspace of C M and is given by S = span({</jfe} fe=0 ). This subspace is, of 
course, the same as the orthogonal complement of the null space of the sampling 
operator, as we have seen earlier. 



Example 4.2 (Interpolation in 
tor is 



i ) From ( ]4.12a| ), the interpolation opera- 



$ = 



(4.14) 



all illustrated in Figure RLW b). The range of this operator is S (the same as the 
orthogonal complement of the null space of the sampling operator in ( j4.12b[ )). 

Interpolation Followed by Sampling Interpolation followed by sampling is de- 
scribed by $*$, which maps from the smaller space, C*, to itself. Since by as- 
sumption fl4.11] ) holds, y is perfectly recovered. Equation ( 14. 11] ) also shows that 
the condition for perfect recovery is the same as the set of vectors {<fik}k=o being 
orthonormal, as in ( ] 1.831 ). This set of vectors is not a basis for C (too few vectors); 
instead, it is an orthonormal basis for the iV-dimensional subspace S it spans. 

Sampling Followed by Interpolation Sampling followed by interpolation is de- 
scribed by P = $$*. Intuitively, this is a more difficult sequence of operations to 
recover from perfectly, as sampling, unless the input is in S, leads to a loss. 
Given our choice of sampling and interpolation operators, 



P z 



$$*$$* ^ $(j>* 



P. 



($$*) 



$$* 



P- 



where (a) follows from ( j4.ll} ). In other words, P is idempotent and self-adjoint, that 
is, P is an orthogonal projection operator. Then, by Theorem 1.261 Px is the best 
least-squares approximation of x in S; if X G S, sampling followed by interpolation 
will perfectly recover x: 
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Theorem 4.1 (Recovery for finite-dimensional 


vectors) Given is sam- 


pling followed by interpolation with 


samplin 


g operatoi 


' $* from 


(|4.24| 


and inter- 


polation operator $ satisfying; (14.251) 


. Then, 


with S = 


n{9), 






X 


= $$ 


*x 






(4.15) 


is the best least-squares approximation of x 


in 5 1 , that 


is, 






x = min lla; — 


*s|| 2 , 


x — X 


± S. 






When x € S, then x = x. 













Example 4.3 (Sampling followed by interpolation in K 4 ) We now illus- 
trate the above result. 

Choose first igS. Then the output is 

(a) 



Px = $$*x = $$*(a tpo + otitpi + a 2 ifi2) 
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where (a) follows because x € S and can thus be expressed as a linear combination 
of its basis vectors, {c/3fe}^ =0 ; and (b) from the expression for $, (4.14) . 

Choose next x (fc S, vector x = [2 2] from Example 14.11 The best 
we can do now is compute an approximation. Applying P = $$*, 
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which clearly belongs to S since it equals the first basis vector tpo. It is also the 
closest vector in S, since the error between x and x is 



x — x 
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s. 



Finite-dimensional vectors can be seen as finite-length sequences from Chap- 
ter |21 In that case, we know that the DFT will be an appropriate Fourier transform 
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when these sequences are circularly extended (or, viewed as periodic sequences). 
We could then define bandlimited subspaces of C M (similarly what we will do in 
the following sections for sequences and functions) . Exercise 14.2 explores sampling 
and interpolation in such subspaces. 

4.2.2 Sampling and Interpolation with Nonorthogonal Vectors 

What we have seen thus far is a rather classical take on sampling and interpolation; 
we now expand it a bit to include nonorthogonal vectors. As in our discussion 
of orthonormal and biorthogonal pairs of bases, nonorthogonal vectors make the 
geometry more complicated; the sampling and interpolation operators are no longer 
adjoints of each other, and the appropriate spaces we discussed earlier, the range of 
the interpolation operator as well as the orthogonal complement of the null space 
of the sampling operator, are no longer the same. 



Sampling Again, sampling is represented by an N x M matrix as in ( 14.10) , but 
this time containing rows that are not orthogonal. We call these rows (p* k and the 
corresponding sampling matrix $*. Thus, for a given input vector x G C M , the 
sampling output is a vector y G 



^iV 



{x, <pp) 
{%, ¥>i) 

(x, <pn- 



Nxl 



vl 
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<P1 




Xi 


JPn-1. 


NxM 


_XM-1_ 



<5>*x. 



(4.16) 



Mxl 



We again assume $* to have maximum rank N. Thus, the sampling operator has 
an (M — Af)-dimensional null space, A/"($*); the set {^fc} fc=0 spans its orthogonal 
complement, S = Af($>*) = spa.n({(fk}k=o )• I n other words, when a vector x G 
C M is sampled, the component that remains is in S and is captured by $>*x; the 
component that is lost due to sampling is in the null space S . 

Example 4.4 (Sampling in R 4 ) Let us define sampling of x e IR 4 to obtain 
three samples y G R 3 as using the midpoints of neighboring pairs of samples, 
as shown in Figure 14.10( a). Sample y/. is the average of Xk and Xk+i, so the 
sampling operator can be written as 
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(4.17a) 



With (3 G R, (3k G K, the null space of $* and its orthogonal complement are 
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(4.17b) 
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(a) Sampling. 



(b) Interpolation. 



Figure 4.10: Sampling and interpolation in R 4 with nonorthogonal vectors. Solid lines 
have weight 1/2, while dashed lines have weight —1/2; factors above the line multiply 1/2. 



Interpolation Again, interpolation is represented by an M x N matrix $, but this 
time it is not the adjoint of the sampling operator $*. For a given input vector 
y € C , the interpolation output is a vector x £ C M ; $ looks the same as in ( 14. 13[ ) . 
When the interpolation operator is specially chosen so that 



that is, it is the pseudoinverse of <&, then S = S, because 



(a) 



d>) 



(<0 



(d) 



S = ft($) y => ^($($*$)- 1 ) ^ K(9) ^ M^*) 2 - y => S 



(4.18) 



(4.19) 



where (a) follows from (438) ; (b) because K{AB) = 11(A): (c) from (1.49a) ; and 
(d) from the definition of S. 

Example 4.5 (Interpolation in R 4 ) Define the interpolation operator as the 
pseudoinverse of ( 14.17a) , 
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illustrated in Figure [47101 (b). With au € K, the range of $ is 
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By equating (4.17b) and (4.20b) , we can easily check that S = S. 



(4.20a) 



(4.20b) 
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Interpolation Followed by Sampling Interpolation followed by sampling is de- 
scribed by $*$, which maps from the smaller space, C , to itself. From the dimen- 
sions of the operators, it is possible for y to be perfectly recovered; this happens 
when $* is the left inverse of $, 



$*$ 



•» 



(¥>n, H>k) 



Sn-h 



(4.21) 



The sampling and interpolation operators are then called consistent. Choosing the 
pseudoinverse in (4.18) for $ satisfies ( 14.21[ ) ; of course, there exist infinitely many 
other left inverses we could also use. The above also shows that the condition for 
perfect recovery is the same as the sets of vectors {<fik}k=o an< ^ {^felfeJo being 
biorthogonal, as in ( 11.1021 ). These sets of vectors are not bases for C M (too few 
vectors); instead, they form a biorthogonal pair of bases for the iV-dimensional 
subspaces S and S they span, respectively. 

Example 4.6 (Interpolation followed by sampling in K 4 ) Because we 
have chosen the interpolation operator in ( 14. 20a) to be the pseudoinverse of the 
sampling operator in ( 14. 17a) , interpolation followed by sampling leads to perfect 
recovery, 
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(4.22) 



Sampling Followed by Interpolation Sampling followed by interpolation is de- 
scribed by P = $$*. When the sampling and interpolation operators are consistent 
as in (14.211), then 



P- 



$$*$$* ( =' 



$$* 



where (a) follows from consistency. In other words, the idempotency of P is guar- 
anteed by consistency, that is, consistency implies that P is a projection operator, 
albeit not necessarily an orthogonal one]! 8 ! Figure [4.111 shows what happens in that 
case; P projects onto S, but the projection is not orthogonal. The approximation 
error x — x is orthogonal to S but not to S. 

It turns out that for P to be self-adjoint as well, $ must be chosen to be 
the pseudoinverse of $*, (4.18) ; the sampling and interpolation operators are then 
called ideally matched, 



P* 



($$*)" 



(a) 



($($*$)-!$* 

(b) 



$(($*$)-!)*$ 



$($*$)-!$* y $$* = p : 



where (a) and (b) follow from ( |4.18[ ). In other words, the self-adjointness of P is 
guaranteed by sampling and interpolation being ideally matched and S = S. 

The previous discussion can be summarized as follows (see also Figure 14.11) : 



78 Projection operator and oblique projection operators are synonyms (see Chapter \T\ we choose 
to use projection operator to mean a nonorthogonal projection operator. 
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Figure 4.11: Subspaces defined in sampling and interpolation. S represents what can be 
measured; it is the orthogonal complement of the null space of the sampling operator $*. 
S represents what can be reproduced; it is the range of the interpolation operator $. When 
sampling and interpolation are consistent, $$* is a projection and x — x is orthogonal to 
S. When furthermore S — S, the projection becomes an orthogonal projection. 



Theorem 4.2 (Recovery for finite-dimensional vectors) Given is sam- 
pling followed by interpolation with sampling operator $* from ( 14.161 ) and inter- 
polation operator $ from (4T31 ). Then, with S = K(&) and S = 7V($*)- L : 

(i) When P = $$* is idempotent, that is, sampling and interpolation operators 
are consistent, then P is a projection operator. When x € S, x is perfectly 
recovered. 

(ii) When P = $$* is idempotent and self-adjoint, that is, sampling and interpo- 
lation operators are consistent and ideally matched, then P is an orthogonal 
projection operator. When x £ S, x is perfectly recovered. 



Example 4.7 (Sampling followed by interpolation in K 4 ) The interpo- 
lation operator $ in (4.20a) is the pseudoinverse of the sampling operator $* in 
(4.17a) ; they are thus ideally matched, and P is self-adjoint. Together with 
consistency, (4.22) , which guarantees that P is idempotent, we get that P is an 
orthogonal projection operator, and projects onto S, 
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Take x = [2 2] e S. A simple calculation shows that Px = x, 
that is, perfect recovery. Choosing a vector not in S, on the other hand, would 
results in a best least-squares approximation of x in S. If [l —1 1 Ol] , 
then Px = 0. 
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To get ideally-matched sampling and interpolation operators that are ad- 
joints of each other, we can apply Gram-Schmidt orthogonalization to $ (see 
Algorithm 11.1) . yielding a new pair of sampling and interpolation operators, \I/* 
and ty , this time with orthogonal vectors, 
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4.3 Sequences 

Now that we have a firm grasp of the matrix view of sampling and interpolation, 
we can move to spaces with domains associated with time and restrict the sampling 
and interpolation operators accordingly. In these developments, we emphasize the 
cases where ideally-matched sampling and interpolation operators are adjoints of 
each other, as they are well-suited to implementations using LSI filtering. 

Shift-Invariant Subspaces of Sequences We start by introducing a class of sub- 
spaces of £ 2 (Z) that will play a prominent role in the material that follows. 



Definition 4.3 (Shift-invariant subspaces of (. 2 {1)) A subspace W C 
£ 2 (Z) is a shift-invariant subspace with respect to shift L G Z + when x n G W 
implies x n -kL £ W for every integer k. In addition, w € l 2 (T,) is called a genera- 
tor of W when W = span({w n -kL} kez) ■ 



For example, the outputs of upsampling followed by filtering, ( 12.1981 ) , form a shift- 
invariant subspace with respect to integer multiples of 2: A shift of x n by 2k is still 
in the same subspace. We will see that the same will be true for the sampling and 
interpolation operators we define shortly. 



Subspaces of Bandlimited Sequences A special case of shift-invariant subspaces 
of particular importance in signal processing is the subspace of bandlimited se- 
quences; we now define it formally and later we look at forming approximations in 
these subspaces through sampling and interpolation. 



Definition 4.4 (Bandwidth for sequences) A 
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x n G £ 2 (Z) 
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(4.23) 
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(a) BL[-7r/2,7r/2]. (b) BL[-7r/4,7r/4]. (c) BL[-tt/8, tt/8]. 

Figure 4.12: Sequences in BL[— n/N, n/N] subspaces. 




(a) Sampling. (b) Interpolation. 

Figure 4.13: Sampling and interpolation in £ 2 (Z) with orthonormal sequences. 



Definition 4.5 (Subspace of bandlimited sequences) A subspace 

BL[— uiq/2,ojo/2] C £ 2 (Ji) is a subspace of bandlimited sequences when all x n G 

BL[— wq/2, wq/2] have bandwidth at most ujq. 



A subspace of bandlimited sequences is shift invariant for any shift L € Z + ; in fact, 
subspaces of bandlimited sequences are the only ones that are simultaneously shift 
invariant for all shifts L £ Z + . To see shift invariance, take x n € BL[— luq/2, loq/2]. 
Then, ([2T85T> states that 



•En—kL 



DTFT 



e- ju,kL X{e iu ); 



the DTFT is multiplied by a complex exponential, not changing the bandwidth of 
the shifted sequence. Figure 14.121 illustrates a few of such subspaces. 

4.3.1 Sampling and Interpolation with Orthonormal Sequences 

As we have done for finite-dimensional vectors, we start with the case when the 
relevant spaces are spanned by orthonormal sets. This case, apart from intuitive 
geometric properties, is both simpler, and better known in practice. 

Sampling We refer to the operation depicted in Figure Ri.l3( a), involving filtering 
with g_„ and downsampling by integer N > 1, as sampling of the sequence x n G 
£ 2 (Z) with prefilter g~ n , and denote it by y n = ($*x)„. Even though it results in 
an infinite number of samples, it involves a dimensionality reduction because there 
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is only one output sample per N input samples. We have seen this combination for 
N = 2 in Section [2.7.41 and the expression for the output in ( 12.195,1 ). 

Generalizing ( 1 2 . 1 9 5 [ ) for an arbitrary N, the output of sampling is 



Vn = (9-n*Xn)\ n 



N 



fcez 

/ J Xk9k-Nn - 

kez 



= (XNn-k, 9-k)k 
(<7fc-iVn, %k)k 



(a) 



(Xk, <Pk-Nn)k = (®*x) n , 



(4.24) 



where (a) follows from tp n = g n . The sampling operator $* is now an infinite matrix 
(see ( 12.194) for N = 2), with rows equal to tp* and its shifts by integer multiples of 
N. We assume these rows to be orthonormal, 



(<Pn-Nt> ifin-Nk) = Sg-k 



$*$ 



I. 



(4.25) 



As before, the sampling operator has a nontrivial null space, S = Af($>*); the set 
{VfclfcGZ spans its orthogonal complement, S = Af^*) 1 - = sp&n({<pk}kei,)- I 11 other 
words, when a sequence x n € £ 2 (Z) is sampled, the component that remains is in S 
and is captured by $*cc; the component that is lost due to sampling is in the null 
space S^. We illustrate this with an example. 



Example 4.8 (Sampling in £ 2 (Z)) Choose N = 2 and the prefilter to be 
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(4.26) 



Then from ( 12.194) , the output is 
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(4.27a) 



Clearly then, for every two input samples, xik and a^2fc+i, we get one output 
sample, yf. = {x^k + £2fc+i)v2. Also, it is obvious that the matrix $* has 
orthonormal rows, since their nonzero parts do not overlap. With otk G M, the 
null space of <!>* and its orthogonal complement are 



M(&) = {xe £ 2 (Z) | x 2k 

S = {otk 9n-2k} k&- 



-x 2 k+i, k e Z}, 



(4.27b) 



Then any x n € ^ 2 (Z) can be decomposed into a part belonging to the null space 
S = Af (<&*), which will be lost during sampling, and to a second part belonging 
to S, which will be preserved during sampling. 
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(a) Interpolation followed by sampling. 




(b) Sampling followed by interpolation. 
Figure 4.14: Sampling and interpolation in £ 2 (Z) with orthonormal sequences. 



Interpolation We refer to the operation depicted in Figure [4. 13( b). involving up- 
sampling by integer N > 1 and filtering with g n , as interpolation of the sequence 
y n £ £ 2 (Z) with postfilter g ni and denote it by x n = (<£ y) n . We have seen this com- 
bination for N = 2 in Section 12.7.41 and the expression for the output in ( 12.198} ). 
The way we chose pre- and postfilters, the sampling and interpolation operators are 
adjoints of each other. 

Generalizing ( 12.1981 ) for an arbitrary N, the output of interpolation is 



X] y k 9n- 



Nk- 



(yk, g 



n~Nk)k 



keZ 



(a) 



(Vk, <Pn-Nk)k = ($y)n, 



(4.28) 



where (a) follows from ip n = g n . The interpolation operator $ is also an infinite 
matrix (see (2.197) for N = 2), with columns equal to tp and its shifts by integer 
multiples of N. Denoting the range of $ by S as before, this subspace is, of course, 
the same as the orthogonal complement of the null space of the sampling operator, 
as we have seen earlier. It is also a shift-invariant subspace with respect to integer 
multiples of N: A shift of x n by kN is still in S. 

Example 4.9 (Interpolation in (- 2 {1)) From ( 14.27a) . the output of interpo- 
lation is 
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(4.29) 



Clearly then, for every input sample, yk 7 we get two output samples X2k = 
%2k+i = J/fc/v2. The range of this operator is S (the same as the orthogonal 
complement of the null space of the sampling operator in ( 14. 27b) ). 
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Interpolation Followed by Sampling Interpolation followed by sampling is de- 
scribed by $*$ as in Figure [4.14( a). Since by assumption (4.25) holds, y n is per- 
fectly recovered. Equation ( 14.251 1 also shows that the condition for perfect recovery 
is the same as the set of sequences {f n -Nk}k<£Z being orthonormal, as in ( 11. 83) . 
This set of sequences is not a basis for £ 2 (Z) ; instead, it is an orthonormal basis for 
the subspace S it spans. 



Sampling Followed by Interpolation Sampling followed by interpolation is de- 
scribed by P = $$* as in Figure [4.14( b). We know this to be a more difficult 
sequence of operations to recover from perfectly, as sampling, unless the input is in 
S, leads to a loss. 

As for finite-dimensional vectors, and given our choice of sampling and inter- 
polation operators, 

P 2 = $$*$$* Q $$* = p 

P* = ($$*)* = $$* = P, 

where (a) follows from (4.25) . In other words, P is idempotent and self-adjoint, 
that is, P is an orthogonal projection operator. Then, by Theorem 11.261 (Px) n is 
the best least-squares approximation of x n in 5; if x n € S, sampling followed by 
interpolation will perfectly recover x n : 

Theorem 4.6 (Recovery for sequences) Given is the system as in Fig- 
ure [4.14( b) with sampling operator $* from ( 14.24) and interpolation operator 
$ satisfying ( 14T251 ). Then, with S = H($), 

x n = ($$>*x) n (4.30) 

is the best least-squares approximation of x n in S 1 , that is, 

x n = min \\x n - x s , n \\ 2 , x n - x„ _L S. 

When x n s S, then x n = x n . 



Example 4.10 We now illustrate both theorems. 

Choose first x n S S, a piecewise-constant sequence over intervals of length 
2, 



X-2 X-2 



J'O 



Xq x 2 x 2 



Then the results of applying filtering by g^ n from ( 14.26) , downsampling by 2, 
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upsampling by 2 and filtering by g n as in Figure 14.14( b) are, respectively, 
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that is, perfect recovery of x n . 
Choose next x ^ S, 
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The best we can do now is compute an approximation. Applying P = $$* 
■- \ ... y/2 
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which clearly belongs to S since it is piecewise constant over intervals of length 
2. It is also the closest sequence in 5, since the error between x n and x n is 
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5. 



4.3.2 Sampling and Interpolation for Bandlimited Sequences 

Since a subspace of bandlimited sequences is also a shift-invariant subspace, every- 
thing we have seen thus far holds here as well. In particular, if x n £ BL[— ujq/2, Wrj/2], 
sampling operator $* is given by ( 14.24) and interpolation operator is $, then by 
Theorem 14.61 x n is perfectly recovered after sampling followed by interpolation. We 
see that the requirement x n 6 BL[— luq/2, loq/2] is more restrictive than x n € S. 
If, on the other hand, x n £ BL[-w /2, ujq/2], then by Theorem RL6l P = $$* will 
project onto S, which is not necessarily equal to BL[— ujq/2,ujq/2]. 

For P to orthogonally project onto BL[— loq/2, ujq/2], we clearly need P to be a 
bit more specific. We now establish what the sampling and interpolation operators 
must be using the machinery from Chapter [2} 

Assume first that x n € BL[— luq/2,loo/2] and that the sampling prefilter in 
Figure 14.14( b) is a simple multiplicative factor of yiV| 79 | Then the output after 
upsampling by N is 



iV-l 



Y s {en 






j(u,-2irk/N)) 



(4-31) 



A:=0 



'Basically, the sampling prefilter is not present; multiplication by \/N is purely for convenience. 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



4.3. Sequences 



399 




X(e"*) 






Y 3 (en 


v/V 


w 


~T? 


-¥ 


N N N 





Y.(e>") 



vT 




(a) Aliasing. 



(b) No aliasing 



Figure 4.15: Downsampling by N followed by upsampling by N oix n £ BL[— uiq/2, u)q/2]. 
(a) When ujq > 2tt/N, spectral replicas overlap; x n cannot be recovered by LSI filtering 
for every x n £ BL[— Wo/2, Wo/2], (b) When luq < 2tt/N, spectral replicas do not overlap; 
x n can be recovered by lowpass filtering by g n . (Illustrated for iV = 4.) 



For some LSI postfilter g n to recover x n , a pointwise multiplication of ( 14.311 ) 
by G{e JiV ) must yield X(e JW ). We want the recovery to work for every x n £ 
BL[— cjq/2, Uo/2], so we cannot count on any property of X(e JUJ ) other than ban- 
dlimitedness ( 14.23) . Thus, multiplication by G(e- J ") must compensate for the 1/yN 
factor in the k = term of ( J4.31D and zero out all the other terms. 

Whether the multiplication by G(e 3U ) will recover X (e JU ) depends on whether 
the spectral replicas in ( J4.31D overlap. Figure 14.151 illustrates the two possibilities, 
which we now discuss in more detail. 



Aliasing When ujq > 2tt/N as in Figure [4.15( a), spectral replicas overlap, and no 
LSI filtering will succeed in recovering x n for every x n £ BL[— Wo/2, Wo/2]. This 
confusion of frequencies is called aliasing; it is one of the most well-known effects of 
sampling in general and is the reason why a prefilter is needed before downsampling. 
We will discuss aliasing in more detail in the next section. 

Sampling Theorem When ujq < 2tt/N as in Figure 3.15( b), spectral replicas do 
not overlap, and obtaining x n = x n requires G(e JUJ ) to be an ideal filter with cut-off 
frequency u>o/2. Choosing exactly luq = 2tt/N uniquely determines the postfilter as 



G(e 



]u\ 



N, M < 7r/iV; 
0, otherwise, 



DTFT 1 . , 7T 

< — > g n = — =smc(— n), (4.32) 



an ideal TVth-band filter from Table 12.51 Intuitively, this tells us that x n can be 
recovered exactly after keeping 1/Nth of its samples only when it occupies less than 
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1/iVth of the full band in the DTFT domain. Since {g n -Nk}kez are orthonormal, we 
can choose the sampling prefilter to be ff_ n j 80 | leading to an orthogonal projection 
operator P. Because x n € BL[— w/N, tt/N], this sampling prefilter will have no 
effect on x n . By construction, the orthonormal set {g n -Nk}kei ls an orthonormal 
basis for BL[— luq/2, ojq/2]. 

This discussion leads us to the sampling theorem for sequences: 



Theorem 4.7 (Sampling theorem for sequences) Given is the system as in 
Figure 14.14( b) with interpolation postfilter g n from Q4.32J ). Then, 

x n = ^2x k smc(^{n-kN)) O i n GBL[-p^]. (4.33) 

feez 



The expression ( 14.331 ) comes from 

X n = ($$*x)„ = 22(XI, gi-Nk)t 9n-Nk 



k& 



- 4 Efc* sinc (^(^ - fciV )))^ sinc (^(« - kN )) 



N ^ s ' X N" y N 

22 x k smc( — (n - kN)), 
fcez 



(c) 



where (a) follows from ( 14.24) and ( 14.28) ; (b) from g n = (1/y/N) smc(im/N)] and (c) 
from x n £ BL[— tt/N, ir/N] and thus the effect of smc((w /N)(n — kN)) on x n is just 
multiplication by N . This theorem can also be seen as a corollary of Theorem 14.61 

Bandlimited Approximation of Sequences We now assume that x ^ BL[— tt/N, tt/N]. 
Then, as a corollary to Theorem 14.61 since P is an orthogonal projection operator 
aad5 = BL[-7r/iV,7r/JV]: 



Theorem 4.8 


(Best least-squares 


bandlimited 


approximation) 


Given 


is the system 


as in Figure 4.14(b) with interpolation postfilter 


a„ from (14.321). 


Then, 














Xn — 


($$*x)„ 






(4.34) 


is the best least-squares approximation 


of x n in BL[— 


-tt/N, tt/N]. 


that is. 




Xn — 

IBL,, 


min \\x n - £bl 

,eBL[-7r/JV,7r/iV] 


,n II j X n 


x n -L BL[ 


-tt/N, tt/N]. 



8Q We choose a time-reversed version to yield an orthogonal projection operator P. This time 
reversal has no effect on the ideal filter since its impulse response is symmetric. 
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(a) Sampling. (b) Interpolation. 

Figure 4.16: Sampling and interpolation in £ 2 (Z) with nonorthogonal sequences. 



The effect of this approximation in the DTFT domain is a simple truncation of the 
spectrum of x n to [—ir/N,w/N]: 

x(en = { x(eJW) ' M - ^ /iV; 

[_ 0, otherwise. 

Exercise 14.61 explores bandlimited spaces with rational sampling rate changes. 



4.3.3 Sampling and Interpolation with Nonorthogonal 
Sequences 

As for finite-dimensional vectors, what we have seen thus far is a classical take 
on sampling and interpolation; we now expand it a bit to include nonorthogonal 
sequences. These make the geometry more complicated; the sampling and interpo- 
lation operators are no longer adjoints of each other, and the appropriate spaces we 
discussed earlier, the range of the interpolation operator as well as the orthogonal 
complement of the null space of the sampling operator, are no longer the same. 

Sampling We now refer to the operation depicted in Figure 14.16( a), involving 
filtering with g n and downsampling by integer N > 1, as sampling of the sequence 
x n € £ 2 (Z) with prefilter g n , and denote it by y n = ($>*x) n . This time, we do not 
make an assumption of orthonormality. 



As in ( 14.24) , we generalize (2.195) for an arbitrary N 

Vn = (g*x)\ nN 



(a) 



2_^9kXNn-k = (xNn-ki 9k)k 

fcez 

2_^ x k9Nn-k = (gNn~k, %k)k 

fcez 

(Xk, <fk~Nn)k = {®*X) n , 



(4.35) 



where (a) follows from (p n = g- n . The sampling operator $* is again an infinite 
matrix (see (2.194) for N = 2), with rows equal to </5* and its shifts by integer 
multiples of N. The null space of <J>* is nontrivial (see Solved Exercise 14.3) . The 
orthogonal complement of this null space is denoted S as before. 
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Example 4.11 (Sampling in £ 2 {Z)) Choose N = 2 and the prefilter to be 

fln = | [... -1 2 

Then from (2.194) , the output is 



-1 



(4.36) 



yo 
y\ 


i 

~~ 8 



-1 


2 


6 


2 


-1 











-1 


2 


(3 


2 - 



Xi 
%2 
X3 



$*x. 



(4.37a) 
Again, for every two input samples we get one output sample. Also, it is obvious 
that the matrix $* does not have orthogonal rows. With a k G K, the null space 
of $* and its orthogonal complement are 



7V($*) = {ie £ 2 (Z) I -x 2k -2 + 2x 2fc _! + 6x 2k + 2x 2k+1 



S 



{(Xk92k-n}keZ.- 



X2k+2 = 0, k G Z}, 
(4.37b) 



Then any £„ € ^ 2 (Z) can be decomposed into a part belonging to the null space 
S 1 - = Af (<&*), which will be lost during sampling, and to a second part belonging 
to S, which will be preserved during sampling. 



Interpolation Again, we refer to the operation depicted in Figure [4. 16( b). involv- 
ing upsampling by integer N > 1 and filtering with g n , as interpolation of the 
sequence y n G £ 2 (Z) with postfilter g n , and denote it by x n = ($y) n , but this time 
it is not the adjoint of the sampling operator $*. It is, however, formally the same 

When the interpolation operator is specially chosen 
ty, that is, it is the pseudoinverse of $, then S = S, 



as in Figure SZHKb) and ( |P8 
so that it formally satisfies (4 



by the same arguments as in ( 14.19) . 

Both S, the range of the interpolation operator $ from (4.28) , as well as S, 
the orthogonal complement of the null space of the sampling operator $* from 
(4.35) , are shift-invariant subspaces with respect to integer multiples of N (see 
Exercise [475]). 

Interpolation Followed by Sampling Interpolation followed by sampling is de- 
scribed by $*$, as in Figure Rl. 17( a). It is possible for y n to be perfectly recovered; 
this happens when 



$*$ 



w 



{<Pn-Ni, fn-Nk) 



(4.38) 



The sampling and interpolation operators are then called consistent. Choosing the 
pseudoinverse in (4.18) for $ would satisfy (4.21) ; of course, there exist infinitely 
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(a) Interpolation followed by sampling. 




(b) Sampling followed by interpolation. 
Figure 4.17: Sampling and interpolation in ^ 2 (Z) with nonorthogonal sequences. 

many other $ we could also use. The above also shows that the condition for 
perfect recovery is the same as the sets of sequences {</?„_jv/c}fcez and {<fn-Nk}kez 
being biorthogonal, as in ( J1.102J ). These sets of sequences are not bases for ( 2 {1t); 
instead, they form a biorthogonal pair of bases for the subspaces S and 5 they span, 
respectively. 

Example 4.12 (Interpolation followed by sampling in £ 2 (Z)) We start 
with N = 2, and assume g n to be as in Q4.36J) . We would like to find an inter- 
polation operator such that ( 14.380 is satisfied. Assume for a moment that our 
interpolation prefilter g n is of the following form: 

g n = a5 n+ i + bS n + a5n-i- 

To satisfy (4.38) , we get the following system of equations: 

2a + 36 = 4, 2a -6 = 0, 

with the solution a = 1/2, 6=1, or 

.01 



fjn 



1 



(4.39) 



and the operator $ an infinite matrix with g n ~2k as its columns. With a k G R, 
the range of $ is 

S = {a k g n -2k}kez- (4.40) 

Sampling Followed by Interpolation Sampling followed by interpolation is de- 
scribed by P = $$*, as in Figure [4.17( b). When the sampling and interpolation 
operators are consistent as in ( 14.38) , then P is idempotent, meaning it is a pro- 
jection operator. It projects onto S, but the projection is not orthogonal. The 
approximation error x n — x n is orthogonal to S but not to S (recall Figure 14.111 for 
a conceptual picture). 

Again, for P to be self-adjoint as well, $ must be chosen to be the pseu- 
doinverse of $*, (4.18) : the sampling and interpolation operators are then ideally 
matched, and subspaces S and S are identical. 

The previous discussion can be summarized as follows: 
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Theorem 4.9 (Recovery for sequences) Given is the system as in Fig 
ure 14.17( b) with sampling operator $* from 
$ from ( 14^28] ). Then, with S = ft($) and S = 



33 



and interpolation operator 

A/-($*)-L : 



(i) When P = $$* is idempotent, that is, sampling and interpolation operators 
are consistent, then P is a projection operator. When x n G 5, x n is perfectly 
recovered. 

(ii) When P = $$* is idempotent and self- adjoint, that is, sampling and interpo- 
lation operators are consistent and ideally matched, then P is an orthogonal 
projection operator. When x n G S, x n is perfectly recovered. 



Example 4.13 (Sampling followed by interpolation in £ 2 (Z)) The inter- 
polation operator $ in Example 14.12 is not the pseudoinverse of the sampling 
operator $* in ( 14. 37a] ) (Exercise 14 .8 1 does that); they are thus not ideally matched, 
and P is not self-adjoint, 



P 



16 



-2 4 12 4 -2 
02545200 
-2 4 12 4 -2 
00025452 



P is consistent only, meaning it is a projection operator. The subspaces S and 
S are not the same. 



4.4 Functions 

In this section, we study sampling and interpolation operators that map between the 
continuous domain, £ 2 (R), and the discrete one, ^ 2 (Z). As before, we concentrate 
on structured operators, those that can be implemented using LSI filtering. Our 
development closely parallels the discrete-time case in the previous section. 

Shift-Invariant Subspaces of Functions We start by introducing a class of sub- 
spaces of C (M.) that will play a prominent role in the material that follows. 



Definition 4.10 (Shift-invariant subspaces of £ 2 (R)) A subspace W c 
£ 2 (R) is a shift-invariant subspace with respect to shift r G M + when x(t) G W 
implies x(t — kr) G W for every integer k. In addition, w G £ 2 (R) is called a 
generator of W when W = span({w(i — fcr)}fe S z)- 
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For example, the outputs of the interpolation operator in (4.3) form a shift-invariant 
subspace with respect to integer shifts: A shift of x(t) by k is still in the same 
subspace. The sampling and interpolation operators we define shortly will also 
have a shift-invariance property. 

Subspaces of Bandlimited Functions A special case of shift-invariant subspaces of 
particular importance in signal processing is the subspace of bandlimited functions; 
we now define it formally and later we look at forming approximations in these 
subspaces through sampling and interpolation. 

Definition 4.11 (Bandwidth for functions) A function x(t) e C 2 (R) is 
said to have bandwidth a->o for the smallest a->o G K + such that the Fourier transform 
X(lo) satisfies 

X(uj) = for all |w| > — . (4.41) 



Definition 4.12 (Subspace of bandlimited functions) A subspace 

BL[— wo/2,o;o/2] C £ 2 (R) is a subspace of bandlimited functions when all x(t) G 

BL[— ujq/2,ujq/2] have bandwidth at most ojq. 



A subspace of bandlimited functions is shift invariant for any shift r G IR + ; in fact, 
subspaces of bandlimited functions are the only ones that are simultaneously shift 
invariant for all shifts r G K + . To see shift invariance, take x(t) G BL[— Wo/2, Wo/2]. 
Then, ( 13.56) states that 

x{t-kr) ££* e~^ kT X(lu): 

the Fourier transform is multiplied by a complex exponential, not changing the 
bandwidth of the shifted function. 

4.4.1 Sampling and Interpolation with Orthonormal Functions 

In Section 14.11 we introduced sampling and interpolation operators and their combi- 
nations, operating on the shift-invariant space of piecewise-constant functions over 
unit intervals. This was an example where subspaces were spanned with orthonor- 
mal functions; we saw that perfect recovery after sampling and interpolation was 
guaranteed by the specific choice of operators as well as the function subspace. We 
now formalize this discussion to any sampling interval T and any x(t) G JZ 2 (M.). 

Sampling We refer to the operation depicted in Figure [4~3l (a) , involving filtering 
with tp(—t) = g(—t) and sampling at t = nT, as sampling of the function x(t) G 
C 2 (M.) with prefilter g(—t), and denote it by y n = (<&*x) n . Through this operation, 
we move from the larger space £ 2 (K) into the smaller one £ 2 (Z). 
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Generalizing ( 14. 2} for an arbitrary T, the output of sampling is 



y n = (x(t), g(t-nT)) t \ t=nT -- j x(t)g(t-nT)dt 

(a) 



x(t) tp(t - nT) dt = <p(~t) * x(t)\ t=nT 

: : {<p(t-nT),x(t))t = (*'i) n , (4-42) 

where (a) follows from ip(t) = g{t). Because the domain of the sampling operator 
$* is functions (instead of sequences), we cannot write $* as an infinite matrix. 
However, there is a strong similarity to sequences in that the components of §*x 
are obtained as inner products between x(t) and shifted versions of a single function 
tp(t). We assume {<p(t — kT)}k^i, to be an orthonormal set, 

((p(t-nT),<p(t-Kr)) = 5 n ~k o $*$ = /. (4.43) 

As before, the sampling operator has a nontrivial null space, S = Af(&*); the 
set {(p(t — kT)}k^z spans its orthogonal complement, S = Af(^*)' L = spa.n({ip(t — 
fcT)}fe S z). In other words, when a function x(t) G £' 2 (M.) is sampled, the component 
that remains is in 5 and is captured by $*x; the component that is lost due to 
sampling is in the null space S . Section 14.1 illustrated this sampling operator 
with T = 1 and ip(t) = g(t) = X[o i)(^)j the indicator function of the unit interval. 

Interpolation We refer to the operation depicted in Figure |4~3T b) , involving point- 
wise multiplication with a Dirac delta comb function st (t) , ( 13.7) , and filtering with 
g(t), as interpolation of the sequence y n G £ 2 (1i) with postfilter g{t), and denote it 

Generalizing ( 14.3) for an arbitrary T, the output of interpolation is 

x{t) = ^y n g{t-nT) = (y n , g(t - nT)) n 

( = } (y n ,v(t-nT)) n = (*y)(t), (4.44) 

where (a) follows from (p(t) = g(t). Denoting the range of $ by S as before, this 
subspace is, of course, the same as the orthogonal complement of the null space of 
the sampling operator, as we have seen earlier. It is also a shift-invariant subspace 
with respect to integer multiples of T; a shift of x by nT is still in S. Section 14.1 
illustrated this interpolation operator with T = 1 and ip(t) = g(t) = X[o,i)(0- 

The way we chose pre- and postfilters, the sampling and interpolation opera- 
tors are adjoints of each other, 

I ) f°° (b) f°° 

{®*x,y)p = (/ x(T)g(T -nT)dT,y n ) ( 2 = y^y n x{r)g{r - nT) dr 

J ~°° nez J -° c 

- f°° x(r)J2y n g(r-nT)dr ^ (x,*y) c », (4.45) 

J — OO „ ^- n, 
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where (a) follows from ( 14.42) ; (b) from the definition of the £ 2 inner product; in (c) 
we interchanged the order of summation and integration; and (d) follows from the 
definition of the C 2 inner product as well as ( 14.44) . 

Interpolation Followed by Sampling Interpolation followed by sampling is de- 
scribed by $*$ as in Figure Rh4f a), 

($*$2/)„ { = } <S>*J2vk9(t-kT) ( = } j \Y,Vk9{t-kT)\ g(t-nT)dt 

= ^Vk / g(t- kT)g(t-nT)dt = y n , 



fcez 



where (a) follows from the expression for the interpolation operator, A4.44) ; (b) from 
the expression for the sampling operator, ( |4.42| ); in (c) we interchanged summation 
and integration; and (d) follows from our assumption, ( 14.43) , that {(p(t— kT)}k<£Z is 
an orthonormal set. Thus, y n is perfectly recovered. Equation ( 14.43J ) also shows that 
the condition for perfect recovery is the same as the set of functions {<p(t — kT)}k^z 
being orthonormal, as in ( jl.83[ ). This set of functions is not a basis for £ 2 (R); 
instead, it is an orthonormal basis for the subspace S it spans. Section [4 . 1 illustrated 
interpolation followed by sampling with T = 1 and </?(£) = g(t) = X[o,i)(t)- 

Sampling Followed by Interpolation Sampling followed by interpolation is de- 
scribed by P = $$* as in Figure 33(b). 

As for sequences, and given our choice of sampling and interpolation operators, 

P 2 = $$*$$* ^ $$* = p^ 
P* = ($$*)* = $$* = P, 

where (a) follows from ( J4.43I) . In other words, P is idempotent and self-adjoint, 
that is, P is an orthogonal projection operator. Then, by Theorem 11.261 (Px)(t) is 
the best least-squares approximation of x(t) in S; if x(t) € S, sampling followed by 
interpolation will perfectly recover x(t): 

Theorem 4.13 (Recovery for functions) Given is the system as in Fig- 
ure 14.4( b) with sampling operator $* from ( 14.42) and interpolation operator $ 
satisfying (4743]) . Then, with S = K($), 

x(t) = ($$*x)(i) (4.46) 

is the best least-squares approximation of x(t) in S, that is, 

x(t) = min \\x(t)-x s (t)\\ 2 , x{t) - x(t) _L S. 

x s (t)eS 

When x(t) G S, then x{t) = x{t). 
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x{t) 1 lf{-t) I 



L J 



, , H 



h y>(£) i *a:(t) 



L J 



Figure 4.18: Figure 14.4( b) between sampling prefilter and interpolation postfilter: 
Sampling of the function y(t) results in a sampled function y s {t). 



Section [4.11 illustrated sampling followed by interpolation with T = 1, ip(t) = g(t) 
X[o,i)( f )> and x(t) £ S. 



4.4.2 Sampling and Interpolation for Bandlimited Functions 

Since a subspace of bandlimited functions is also a shift-invariant subspace, every- 
thing we have seen thus far holds here as well. In particular, if x(t) € BL[— ujq/2, Ct»o/2], 
sampling operator $* is given by ( 14.421 ) and interpolation operator is $, then by 
Theorem 14.131 x(t) is perfectly recovered after sampling followed by interpolation. 
For P to orthogonally project onto BL[— Wo/2, ujq/2], we clearly need sampling and 
interpolation operators, and thus P, to be a bit more specific. We now establish 
what they must be using the machinery from Chapter [3} 

We first discuss the sequence of operations between the sampling prefilter and 
interpolation postfilter in Figure 14741 (b), depicted separately in Figure 14.181 function 
y(t) multiplied by the Dirac delta comb function ST{t), ( |3.7[ ), produces a sampled 
function y s {t). Using ( |3.73a| ). 



s T (t) = ^5{t-nT) 



FT 



St(uj) 



2ti 



E^ 



u> k 

T 



the sampled function y s (t) can be compactly represented as 

Vs{t) = y(t)s T (t) = y(t)^26(t-nT) ( = } J2 y(nT) 6(t - nT) 



(4.47) 



(4.48) 



Hti 



iie- 



wheie (a) follows from the sampling property of the Dirac delta function in Table [371] 
Let us now find the Fourier transform of y s (t) as well as the DTFT of y n : 



V.(t) 



FT 



/cjo 
Y, v{nT) 5(t - nT)t 



-*"* dt 



Yl y( nT ) ' 



,-jumT 



»e: 



DTFT 



Y(en = Y.y( nT )< 



(4.49a) 
(4.49b) 



ntz- 



From this, we see that the Fourier transform of the sampled function and the DTFT 
of the sequence of samples are related by 



Y(e">) = Y s (- 



(4.50) 
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that is, they are the same modulo scaling by T to make Y s 27r-periodic. 
An alternative version of Y s {ui) is often useful: 

(c) l rv / 2tt 



A-e: 



S y «-7 k - < 4 - 51 ) 



where (a) follows from the convolution in frequency, ( 13.65} ); (b) from (4.47) ; and 
(c) from the shifting property of the Dirac delta function (see Table 13.1} ). This, 
together with (4.50) leads to the expression connecting the DTFT of a sequence of 
samples y n with the Fourier transform of its underlying continuous-time function 

IV)-!^) l£y(£ **V ,4.52) 



Assume now that x{t) € BL[— uiq/2, Uq/2\ and that the sampling prefilter in 
Figure 14.4( b) is a simple multiplicative factor of vTr 1 ! so that 



leading to 



y(t) = Vfx(t), (4.53) 






where (a) follows from (4.51) . For some LSI postfilter g(£) to recover x(t), a point- 
wise multiplication of (4.54) by G(u>) must yield X(u>). We want the recovery to 
work for every x(t) G BL[— U>o/2, uiq/2\, so we cannot count on any property of X(uj) 
other than bandlimitedness ( 14.41) . Thus, multiplication by G(ui) must compensate 
for the 1/vT factor in the k = term of (4.54) and zero out all the other terms. 

Whether the multiplication by G(u)) will recover X(u)) depends on whether 
the spectral replicas in (4.31) overlap. Figure 14.191 illustrates the two possibilities, 
which we now discuss in more detail. 

Aliasing When luq > 2ir/T as in Figure 14.19( a), spectral replicas overlap, and no 
LSI filtering will succeed in recovering x(t) for every x(t) G BL[— U)o/2, u>o/2]. As 
we have seen, this confusion of frequencies is called aliasing. A graphical example 
is found in old Western movies, shot at 24 frames/s, where the spokes of wagon 
wheels seem to be turning backwards due to aliasing (Exercise 14.9) . 

Example 4.14 (Aliasing of sinusoids) Let x(t) = cos(aj t/2) 7 where lu is 
the frequency in radians/s, and let ui a = 2tt/T be the sampling frequency. In 



81 As for sequences, this means that the sampling prefilter is not present; multiplication by y/T 
is purely for convenience. 
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T T T 



Y s (co) 



V2 



7' T T 



(a) Aliasing. 



X(u) 




T T T 



Y.{u) 



vT' 






T T T 



(b) No aliasing. 



Figure 4.19: Sampling at t = nT followed by multiplication by a Dirac delta comb 
function of x(t) £ BL[— Wn/2, Wo/2], (a) When ll>q > 27r/T, spectral replicas overlap; 
x(t) cannot be recovered by LSI filtering for every x(t) £ BL[— uio/2, uio/2]. (b) When 
uio < 2ir/T, spectral replicas do not overlap; x(t) can be recovered by lowpass filtering by 
g(t). (Illustrated for T - 4.) 



Fourier domain, 



cos(yi) 



e 3 2 z + e J 2 l 



™> 4^-^) + ^+^)). (4.55a) 



Sampling x(t) with period T, we get 



cos(-^t) £j(t-nT) 



FT 



W s 



ir^25(u>-kLj s -— ) + 5{uj-kLu s + — ). (4.55b) 



A-e: 



We thus see that cos(toet) with o>£ = wo/2 + £u) s will have the same Fourier 
spectrum, or the same samples. Indeed, if x(t) = cos{iuot/2) and x'(t) = 
cos((cl>o/2 + tu s )t), 

x (nT) = cos((wo/2 + £u} s )n(2ir/u! s )) = cos(ujQnT/2 + £2im) = x(nT); 

we are not able to tell from which cosine function the samples came (see Fig- 
ure 14.201 for an example) . 

Example 4.15 (Aliased spectra) Let x(t) be a function with the Fourier 
transform as in Figure 14.21( a) , 



A» 



1-M/tt, M<tt; 
0, otherwise 



FT 



x(t) 



sin(ft) 



-/ 
2'- 



(4.56) 
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cos(7rf/2), cos((tt/2 + 2w)t) 




Figure 4.20: Illustration of aliasing. Sampling of two cosine functions, cos(u>ot/2) and 
cos((uo/2 + W s )t), with ujq — tt and sampling frequency lo s = 27r, or T — 1, produces the 
same samples. 



X(u) 





-3n -2jr -jr 



(c) W 3 = 37r/2 (d) II) S = -7T 

Figure 4.21: A triangle spectrum and various sampled versions, (a) Original spectrum 
as in Q4.56J) . (b) Spectrum of the sampled function with sampling frequency lu 3 — 2ty. (c) 
Spectrum of the undersampled function with sampling frequency lo 8 — 37r/2. (d) Spectrum 
of the undersampled function with sampling frequency lu s = it. 



The same way the hat function is a convolution of two box functions as in Ex- 
ample 13.31 this spectrum can be seen as a convolution of two halfband filters as 
in Table 13.61 and thus, x(t) is a multiplication of two sine functions. Sampling 
x(t) at lu s = 2tt, the triangles stack next to each other with no overlap as in 
Figure [4.21( b). If we undersample, the triangles will start to overlap, and the 
original function cannot be recovered from the samples as in Figure 14.21( c) and 
(d). Take lo s = it (T = 2). In spectral domain, the triangles sum up to a constant 
1/2 on [0, 7r] (from ( 14.511 )) as in Figure 14.21( d). In time domain, this corresponds 
to a Kronecker delta sequence, ird n = x(2n), which follows from ( 14.56) . 

Sampling Theorem When ojq < 2n/T as in Figure RLlW b), spectral replicas do 
not overlap, and obtaining x = x requires G(uj) to be an ideal filter with cut-off 
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X{u) 





(c) 

Figure 4.22: Illustration of the sampling theorem with T — 1. (a) The Fourier-domain 
function X(u>), supported on [— Wo/2, wo/2] = [— tt, 7r]. (b) Spectrum of the sampled func- 
tion (solid line), with sampling period T — 2tt/ujq — 1, or, equivalently, sampling frequency 
uj a = loo — 2-k/T — 2ir. Spectral repetitions do not overlap. Interpolated version uses an 
ideal lowpass filter (dashed line) to extract the base spectrum [— ujo/2,uoo/2\ — [— tt,tt]. (c) 
The time-domain function x(t). (d) The original function x(t) (dashed line) reconstructed 
using sine interpolators (solid lines). 



frequency Wrj/2. Choosing exactly ujq = 2tt/T uniquely determines the postfilter as 



G(w) 



T, |w| <7r/T; 
0, otherwise, 



FT n\ 1 • i* +\ 

< — > g{t) = -^smc(-i), 



(4.57) 



an ideal lowpass filter from Table 2.51 Intuitively, this tells us that x(t) can be 
recovered exactly after keeping its values at t = nT only when its bandwidth is less 
than 2tt/T. Since {g(t — kT) = (1/yT) smcMt — kT) /T)} kez are orthonormal, we 
can choose the sampling prefilter to be g(— t)^ 2 } leading to an orthogonal projection 
operator P. Because x{t) G BL[— 7r/T, tt/T], this sampling prefilter has no effect 
on x(t). By construction, the orthonormal set {git — kT) = (1/vT) sinc(7r(i — 
kT)/T)}k<£Z is an orthonormal basis for BL[— Wrj/2, Wo/2]- 

The frequency uj s = 2ir/T is called the sampling frequency, often also called 
Nyquist frequency; we see that it equals twice the maximum frequency Wrj/2 of the 
spectrum of the input function. 

This discussion leads us to one of the cornerstone results in signal processing, 
the sampling theorem! 83 ) illustrated in Figure 14.221 



82 We choose a time-reversed version to yield an orthogonal projection operator P. This time 
reversal has no effect on the ideal filter since its impulse response is symmetric. 



83 



The sampling theorem was formulated and proved by a number of scientists and could bear 
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Theorem 4.14 (Sampling theorem) Given is the system as in Figure 14741(b) 
with interpolation postfilter g(t) from ( 14.57) . Then, 

x(t) = ]>>(nT) shiest -nT)) o x(t) € BL[-£, £]. (4.58) 

Proof. Using ( 14.511) , express the reconstructed function in the Fourier domain as 

This multiplication simplifies greatly because of the supports of G(lu) and X(w). Since 
x(t) £ BL[— 7r/T, 7r/T], only the k — term in the summation has a support that 
overlaps with the [— 7r/T, 7r/T] support of G(co). Using the scale factor VT from ( J4.5TD , 
we get simply X(u>) — X(u>). Note that the sampling prefilter g(—t) has no effect on 
the function since it is already bandlimited. 

Exercise 14.121 establishes the completeness of g(t) and its integer shifts as an or- 
thonormal basis for BL[— x/T, tt/T}. Then, (438) simply expresses x(t) e BL[— tt/T, tt/T] 
as an orthonormal expansion in the orthonormal basis (l/vT)sinc(7r(t — kT)/T)}k^z- 

The sampling theorem gives a sufficient condition for reconstruction of a func- 
tion from its samples, namely, for a function with bandwidth 2ir/T, a maximum 
sampling period of T, or, a minimum sampling frequency of lj s = 2tt/T, is needed. 
This is sometimes not necessary, as shown in the following example. 

Example 4.16 (Bandpass sampling) Let x(t) be a function with the Fourier 
transform as in Figure 14.23( a) , 

X(w) = { S-H/^ 27r<M<37r; 
( U, otherwise. 

Since the maximum frequency is 37r, one might think that a sampling frequency 
of Gtt is required. Figure 14.231 shows this not to be true and that a sampling 
frequency of u> s = 2tt is sufficient. The spectrum of the sampled version X s (ui), 
with T = 2tt/lu s = 1, is simply 

X s {lu) = ^X{uj-2irk). 
kez 

The various parts fill the spectrum without overlapping with each other, creating 
a triangular spectrum with periodicity 27t. For reconstruction, an ideal bandpass 
filter carves out the correct spectrum on the intervals 27r < \u>\ < 37r. 



Exercise 14.131 explores bandpass sampling further, showing an orthonormal basis 
interpretation, a modulation-based solution, and generalizations. The main idea is 
that for bandpass functions with a total frequency support of ujq on two intervals 
of size ujq/2 each, a sampling frequency of to s = luq is sufficient. 



all their names: Shannon, Kotelnikov, Raabe, Whittaker and Someya, see Historical Remarks for 
more details. 
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X(u) 




-3n —In —n 



n 2tt 3 7r 




-3 7T —1 71 -TT 



(a) 



(b) 



Figure 4.23: Bandpass sampling, (a) Original spectrum X(uj) in passbands 2tt < |w| < 
37r. (b) Spectrum X a (w) of the sampled function with sampling frequency Lu a — 2n filtered 
with an ideal bandpass filter G(w). 

Bandlimited Approximation of Functions We now assume that x(t) ^ BL[— tt/T, tt/T). 
Then, as a corollary to Theorem 14. 131 since P is an orthogonal projection operator 
and S = BL[-tt/T,tt/T}: 

Theorem 4.15 (Best least-squares bandlimited approximation) Given 
is the system as in Figure 33(b) with interpolation postfilter g(t) from ( 14.57J ). 
Then, 

x(t) = ($$*x)(i) (4.60) 

is the best least-squares approximation of x(t) in BL[— tt/T, tt/T], that is, 

m = min \\x(t)-x BL (t)\\ 2 , x(t)-x(t) 1 BL[-tt/T, tt/T}. 

z B L(t)6BL[-7r/7>/T] 



The effect of this approximation in the Fourier domain is a simple truncation of the 
spectrum of x(t) to [—tt/T, tt/T}: 



X(w) 



X(w), |w|<7r/T; 
0, otherwise. 



Continuous-Time Processing Using Discrete-Time Operators One more key re- 
sult around sampling is mathematically simple but has broad technological im- 
pact. It shows how to implement continuous-time signal processing operations using 
discrete-time processing ones. In particular, we show that for convolution. 



Proposition 4.16 (CT convolution implemented using DT processing) 
Let x{t) G BL[— loq/2, uiq/2] and T = Itt/loq. The continuous-time convolution 




y(t) 


= (g*x)(t), 








can 


be computed using the discrete- 

Vn 


time convolution 
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via 


V(t) = ^T^ynS'mci — it-nT)), 

nGZ 


(4.61a) 


with 








x n = vT x(nT), 


(4.61b) 




7T 

9n = (g(t), sinc(— (t-nT))), 


(4.61c) 



Proof. It is easiest to prove this in Fourier domain. We want to show that 

Y(u) = G(u)X(u) 

can be obtained as an interpolated discrete-time convolution. Since x(t) £ BL[— ojo/2, wo/2], 
the spectrum of G(lu) matters only on this support. 
Let us calculate Y(e 3 "), 

Y(0 = G(e n X(en ® J=g(£)VTX.(£) 



= ^g)xg), M< T , (4.62) 



where in (a) we observed that ( ]4.61c[) is the ideal lowpass filtering of g(t) to [ojo/2, wo/2] 
followed by sampling with frequency uo, and for Xie?^) we used both ( 14.50) and ( ]4.61bj) ; 
and (b) from ( |4.51|) restricted to |w| < 7r. 
We can write y(t) from ( 14. 61a) as 

V(t) = T(J2yrAt-nT))*(^=smc(^t)), 
nez v J 

or, in Fourier domain 

TY(e^ T ) = G?(w)X(w), |o;|<a;o/2; 



1 0, otherwise, 

because of both (14.621) as well as the fact that the Fourier transform of (I/Vt) smc(nt/T) 
is an ideal lowpass filter as in Table 3.61 Basically, what we did to interpolate Y(u>) 
from Y(e :,UJ ), was to rescale uj so as to get a wo-periodic function, and then to cut out 
the base spectrum between —loq/2 and uio/2. 

While we showed the result for convolution, other continuous-time signal processing 
algorithms having a bandlimited result can also be implemented in discrete time; 
see Exercise 14.141 

Approximations to Ideal Filters In all our developments, we used ideal filters, both 
as prefilters to obtain perfectly bandlimited functions, or as postfilters, to perfectly 
interpolate bandlimited functions. However, ideal filters cannot be implemented; 
moreover, they have slow decay in time domain, of the order 1/t. The solution is 
to use filters smoother in the frequency domain than the ideal filters. While such 
filters are more realistic, they will either lead to approximate reconstruction of the 
input function after sampling and interpolation, or will require oversampling. 
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|G(w)| 





IGMI, |K.M| 
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2 


2 -f a 



(a) Filter. 



(b) Oversampling. 



(c) Critical sampling. 



Figure 4.24: Nonideal, but faster decaying filters and oversampling. (a) Filter G(w) 
with continuous spectrum obtained from convolving two ideal filters of bandwidth luq and 
a. (b) Input spectrum X(lu) with support [—ujq/2,u)o/2] and oversampling with sampling 
frequency uj s — loq + a to allow perfect interpolation, (c) Input spectrum X(u) with 
support [— ojo/2 — a, cuo/2 + a] and critical sampling with sampling frequency lu 3 = a?o + 2a, 
leading to imperfect reconstruction at the boundaries. 



Example 4.17 (Approximations to ideal filters) Consider a filter that is 
lowpass, but instead of being ideal, it has a spectrum that is continuous, for 
example, 



G(uj) -- { l-(| w |-<V2)/a, 
0, 



|w|<wo/2; 

^o/2 < |w| < w /2 + a; 
otherwise, 



(4.63a) 



as in Figure 14.24( a) . One way to obtain such a filter is to convolve two ideal 
filters in frequency, one with cut-off frequency of Wrj/2 and gain 1, and the other 
with cut-off frequency of a/2 and gain 1/ev, where a -C uiq. Thus, in time, the 
impulse response is the product of the two corresponding sine functions, 



/ s w ri /wo \ ( a 

7(e) = — j- sine — e Sine — E 
n ' 47T 2 a V 2 ) V2 



(4.63b) 



This impulse response decays faster, as 1/t 2 , but uses more bandwidth, since the 
support of G(u>) is ujq + 2a. Then, even if X(u>) is bandlimited to [— ujq/2, Wrj/2], 
the sampling and reconstruction using G(uj) requires a sampling frequency of 
u) s = ujq + a, or, oversampling of a/u)Q. If X(u>) is bandlimited to [— c^o/2 — 
a, luq/2 + a] and sampled at lj s = loq + 2a, we lose the tail of the spectrum due to 
the decay of G(ui) at the boundaries. These two cases are shown in Figure Rl. 24( b) 
and (c). 

In summary, having a smooth interpolation filter in frequency has a cost, 
since only the ideal filter allows critical sampling at the minimum sampling fre- 
quency together with perfect reconstruction. In practice, most functions of in- 
terest have decaying spectra towards their band limit, and thus, the effect of 
an imperfect reconstruction near the boundary (Figure 14.24( c)) is usually not 
severe. Solved Exercise 14.41 explores this topic further. 

We now describe a practical application of the concepts discussed so far, speech 
processing in mobile phones. 
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(a) Digital implementation of filtering. 

















x(t)- 


m 




9c(t) 




ait) 


-y(t) 

















(b) Discrete-time filter in continuous time. 




Analog implementation of filtering. 



Figure 4.25: Analog versus digital implementation of filtering, (a) The input is pre- 
filtered with g(t) and sampled with sampling period T. This sampled version is filtered 
with a discrete-time lowpass filter h n . Finally, the samples are translated to analog, 
continuous-time domain and postfiltered with g(t). (b) Continuous-time filter g c (t) is 
obtained from the discrete-time one h„. (c) The input function x(t) is convolved with 
gtot(t) = (g* g c * g){t) to yield the output y(t). 



Example 4.18 (Speech processing in mobile phones) The bandlimited as- 
sumption used in speech and audio processing is based on the fact that humans 
cannot hear frequencies above 20 kHz. Thus, music for compact disks is sampled 
at 44 kHz, with a lowpass filter having a passband from —20 kHz to 20 kHz and a 
transition band of 2 kHz. For speech, in telephone applications where bandwidth 
has always been at a premium, a passband from 0.3 to 3.4 kHz is sufficient for 
good-quality speech. A sampling frequency of f s = 8 kHz is used, and the ana- 
log pre- and postfilters have a passband of up to 3.4 kHz, followed by a smooth 
transition to very high attenuation between 3.4 and 4 kHz. For the sake of this 
example, we assume that these analog filters have a Fourier spectrum like G(ui) in 
( 14. 63a) , with ujo/2 = 2-7T ■ 3.4 = 6.87T krad/s and a = 1.2ir krad/s. The sampling 
frequency is then u> s = 2-nf s = 2ir ■ 8 = 167T krad/s, with T = 2tt/lo s = 0.125 
ms sampling period. We now show how filtering in the analog (continuous-time) 
domain can be implemented using filtering in the digital (discrete-time) domain, 
as depicted in Figure 14.25( a). For simplicity, in what follows and in the figures, 
we will express everything in terms of angular frequency in krad/s (that is, 10 3 
rad/s). 

For the input function, assume an exponentially decaying spectrum de- 



picted in Figure 14.27( a) 



X(u) 



-2\ui\/uj . 



(4.64) 



thus X(Q) = 1 and X(±u>q/2) = 1/e. Before sampling, we prefilter with G(u)) 
from (4.63a) , depicted in Figure 14.26( b) ; the result of this operation is shown in 
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\H(en\ 



\G{w)\ 



(a) Digital filter. 

\HcH\ 



(b) Analog pre/postfilter. 



\G tot (oj)\ 




(c) Analog version of the digital filter. (d) Equivalent analog filter. 

Figure 4.26: Filters involved in analog and digital implementation of filtering in Fig- 
ure 14.251 The sampling period is T — 0.125 ms, or, the sampling frequency is lo s — 16ir 
krad/s; frequency axes are in 10 3 rad/s. 



Figure [4..27( b). We can now sample the resulting signal yielding a discrete-time 
signal A(e J ") depicted in Figure [4.27( c). 

We can now filter in the discrete-time domain. Assume we implement a 
discrete-time lowpass filter that approximates an ideal halfband filter, 



H{e^) 



1, 
lO" 1 , 



M < 3tt/8; 
5tt/8 < \w\ < n: 



(4.65) 



the band 37r/8 < \u>\ < 5tt/8 being a transition band where the filter is un- 
specified; this filter is depicted in Figure [4.26( a). It corresponds, by frequency 
scaling ( j4.50|) , to a lowpass in continuous time depicted in Figure 14.26( c) with a 
frequency response 

H c (u) = H(e JU,T ) = H(e jw/ ^ w3) ) \w\ < 8tt • 10 3 , 

or a passband up to w/(8 ■ 10 3 ) = 37r/8 krad/s. The discrete-time lowpass filter 
can be designed using Parks-McClellan optimization; Figure 14.26( a) gives just a 
conceptual plot. 

Finally, the interpolation postfilter G(uj) filter cancels the repeated spectra 
to interpolate the digitally filtered version of the function. Therefore, the overall 
effect in analog, continuous-time domain, is the product of H c (uj) and the square 
of G{u>), since it is applied as both a pre- and postfilter, 



GtotM 



H c {lo)G 2 {lo), M<8tt-10 3 ; 

0, M>87T-10 3 . 



The key in this process is the rescaling of the frequency axis, which maps 
the [— 7r,7r] interval of the DTFT to the [— cjq/2, wq/2] interval of the Fourier 
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(b) Prefiltering. 



(c) Sampling. 



(d) Filtering in digital domain. 




(e) Output spectrum Y(uS). 

Figure 4.27: Analog versus digital implementations of filtering in spectral domain. 
Filtering the input spectrum X(u>) in the analog domain with an analog filter Gtot{u>) 
from Figure 14.26( d) yields the output Y(u>). The same can be achieved by prefiltering 
with an analog filter G{ui) from Figure 14.26( b), sampling, filtering with a digital filter 
H(e JU ') from Figure |4, 26( a), and interpolating using the postfilter G(w). Solid black lines 
are signals at each point in the system from Figure 14.25( a), dashed black lines indicate 
the spectrum to be filtered while the dashed red lines indicate the corresponding filters. 
Frequency axes are in kHz. 



transform. Note that the scale factor from input to output, while mathematically 
specified by factors such as 1/T in ( 1 4 . 5 1 [ ) and gains in analog and digital filters, 
depends on implementation issues such as analog amplifiers and A/D and D/A 
converter scaling factors. 



Multichannel Sampling The classic sampling result in Theorem |4.14| has seen 
many extensions and generalizations, of which we give but a sample. Recall the 
sampling operator in Figure RL3[ a), but now we have two branches with filtering 
and sampling yielding a two-channel system as in Figure 14.281 We will call this a 
two-channel filter bank. 

Assume for a moment that X{uj) is bandlimited to [— 71", 7r]; this means it 
can be sampled with sampling frequency uj s = 2ir, or, sampling period T = 1. 
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Figure 4.28: Multichannel sampling for N — 2. The sampling operator consists of two 
branches with filtering and sampling in parallel, producing two sampled outputs. 



However, since we have access to two sampled outputs, it is intuitive that under 
certain conditions on the filters Gq(u) and Gi(u>), sampling the two channels with 
a sampling period 2T would be a sufficient representation of X(u>). To see whether 
this is true, write the spectrum Yi{uj), i = 0, 1, in terms of Gi(u>) and X(ui), 



Yi{ui) = ^2Gi(u + kTr)X(u> + kTr) 

feez 

"'' ^G l {uj + 2nk)X(uj + 2TTk) + sr ' 



kei 



kez 



Gi{uj + 2irk + tt)X(uj + 2irk + 7r), 



where (a) follows from ( J4.51I ); and in (b) we split the sum into even- indexed and 
odd-indexed terms. Since Yi{u>) is periodic with period 7r, we can consider only one 
interval [0, 7r]. Moreover, since X(u>) is bandlimited to [— n, ir], only two spectral 
components overlap on [0, n], 

Y (u) = G {Lu)X{Lu) + Go{LU-Tr)X(cu-ir), 

Yi(u) = Gi(w)X(w) + G 1 (w-7r)X(w-7r), 

or, in matrix notation, for u> G [0, 7r], 



Y (u) 

YiH 



Gp(uj) G {u>-ir) 
Gi(w) Gi(w-tt) 



X{w) 

X(u - it) 



G(w) 



X(u> - it) 



(4.66) 



We see that as long as the matrix G(u>) is nonsingular on the interval [0, w], we can 
recover X(u) on [— tt, 7r]. The key is that because X(u)) is bandlimited, its under- 
sampled version by a factor 2 contains only two overlapping copies, and having two 
versions as in ( 14.66) allows us to separate them when the matrix G(ui) is invertible. 
We illustrate this in the next two examples. 

Example 4.19 (Sampling a function and its derivative) Let x{t) e 
BL[— 7r, 7r], and use a two-channel system as in Figure [4.281 with the identity filter 
and the derivative filter, 

G r o(w) = 1, Gi(w) = jui, 
and sampling at t = 2n. The spectra of the sampled channel signals are 



YoH 
Yi(w) 



1 1 

ju j(u>-n) 



X(u) 
X(u> 
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The determinant of the above matrix, det(G(u;)) = jn, is a constant, making the 
system invertible and showing that one can reconstruct a bandlimited function 
from twice undersampled versions of the function and its derivative. 

Example 4.20 (Simple nonuniform sampling) Let x(t) e BL[— it, it], and 
use a two-channel system as in Figure 14.281 with the identity filter and the delay 
of t e (0, 2) filter, 



and sampling at t 



G (w) = 1, Gi(«) = e-^ T , 
2n. The sampled channel signals are 



Yo(oj) 




i 


1 




X{w) 


n(w) 




e -3^r 


e —j(u—ir)r 




X{oj-it 



The determinant of the above matrix is det(G(w)) = e~ 3U7 ' (e J7TT — 1). This is 
different from zero for r £ (0, 2), albeit arbitrarily ill-conditioned as r approaches 
or 2. This comes as no surprise, since for either no delay t = 0, or delay of 
t = 2, the samples in the two channels are the same, leading to an undersampled, 
aliased sampling of the input. As a sanity check, choose r = 1, which should 
lead to the usual sampling of x(t), with even samples in channel 0, and odd ones 
in channel 1. Then, 



Y (w) 



i 



i 



and we can recover the input since det(G(w)) 



A» 
X(w 






X{w) 

X(U!-'. 



Y Q (w) 

YiH 



The two-channel case in Figure 14.281 just discussed can be readily extended to 
TV-channels and bandlimited functions of arbitrary bandwidth: 



Theorem 4.17 (Multichannel sampling (Papoulis)) Let x(t) e 

BL[—u)o/2,u)q/2], and let T be a sampling period with T > 2tt/u>q. Con- 
sider an iV-channel filter bank with filters cji(t), i = 0, 1, . . . , N — 1, followed 
by uniform sampling with period NT. A necessary and sufficient condition for 
recovery of x(t) is that the matrix 



G(«) 



GoM 
GiH 



Gi(w + ^) 



Gjv-i(w) Gn-\{uj + j^p) 



G (w 



27r(JV-l) ■ 

NT ■ 
2ir(N-l) • 

NT ■ 



Gjv-i(w + 



2n(N-l) -. 
NT - 



(4.67) 



be nonsingular for u> € [0, j£j;] 
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(a) Sampling. (b) Interpolation. 

Figure 4.29: Sampling and interpolation in £ 2 (R) with nonorthogonal functions. 



The proof is a direct extension of what we saw for two channels. Exercise 14.151 ex- 
plores some ramifications of this result, in particular for derivatives and nonuniform 
sampling. 

4.4.3 Sampling and Interpolation with Nonorthogonal Functions 

As for sequences, what we have seen thus far is a classical take on sampling and 
interpolation; we now expand it to include nonorthogonal functions. These make 
the geometry more complicated; the sampling and interpolation operators are no 
longer adjoints of each other, and the appropriate spaces we discussed earlier, the 
range of the interpolation operator as well as the orthogonal complement of the null 
space of the sampling operator, are no longer the same. 

Sampling We now refer to the operation depicted in Figure 14.29( a). involving 
filtering with g(t) and sampling at t = nT, as sampling of the function x{t) G £ 2 (R) 
with prefilter g(t) and denote it by y n = (<&*x) n . Through this operation, we move 
from the larger space £ 2 (M.) into the smaller one £ 2 (Z). This time, we do not make 
an assumption of orthonormality. 

Then, the output of sampling is 



y n = / x(t) g(nT — t) dt = / x(t) ip(t — nT) dt 

) J — oo 

m*x(t)\ t=nT = (tp(t-nT),x(t)} t = ($*x) r 



(4.68) 



where (a) follows from ip(t) = g(—t). This time, we do not make an assumption of 
orthonormality. As before, the sampling operator has a nontrivial null space, S 1 - = 
jV($*); the set {!p(t — kT)}k<=z spans its orthogonal complement, S = A/"($*) _L = 
span({(/?(£ — fcT)}fc g z)- In other words, when a function x(t) G £ 2 (R) is sampled, 
the component that remains is in S and is captured by $*x; the component that is 
lost due to sampling is in the null space S 1 - . 

Interpolation Again, we refer to the operation depicted in Figure [4731 Tb)., involving 
pointwise multiplication with a Dirac delta comb function St(£), ( 13.7) , and filtering 
with g(t), as interpolation of the sequence y n G £ 2 (Z) with postfilter g(t), and denote 
it by x(t) = ($y)(i), but this time it is not the adjoint of the sampling operator 
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(a) Interpolation followed by sampling. 
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(b) Sampling followed by interpolation. 
Figure 4.30: Sampling and interpolation in £ 2 (R) with nonorthogonal functions. 

$*. For a given input vector y n G ^ 2 (Z), the interpolation output is a function 
x(t) £ £ 2 (M); $ looks the same as in ( j4.44| ). When the interpolation operator is 
specially chosen so that it formally satisfies (4.18) , that is, it is the pseudoinverse 
of $, then S = S, by the same arguments as in (4.19) . 

Example 4.21 (Interpolation in ^ 2 (Z)) Choose T = 1 and the postfilter to 



be the hat function from (3.49a) . 



<p(t) = 9® 



1-1*1, |*|<i; 

0, otherwise, 



(4.69) 



as the generator of a subspace S shift-invariant with respect to integer shifts. The 
subspace S = span({y>(£ — fc)}fc S z), is the space of piecewise linear functions 
with changes of derivative at the integers. 



Interpolation Followed by Sampling Interpolation followed by sampling is de- 
scribed by $*$, as in Figure RL30( a). It is possible for y n to be perfectly recovered; 
this happens when 



$*$ 



:= 



{(pit - riT), <p(t - kT)) 



On—l 



(4.70) 



The sampling and interpolation operators are then called consistent. Choosing the 
pseudoinverse in ( 14.181 ) for $ would satisfy ( 14. 70} ); of course, there exist infinitely 
many other $ we could also use. The above also shows that the condition for perfect 
recovery is the same as the sets of functions {<p(t — kT)}k<=z and {<p(t — kT)}k^z 
being biorthogonal, as in ( 11.102) . These sets of functions are not bases for C? (R) ; 
instead, they form a biorthogonal pair of bases for the subspaces S and S they span, 
respectively. 

Example 4.22 (Interpolation followed by sampling in £ 2 (Z)) We start 
with T = 1, and assume g to be as in (4.69) . We would like to find a sampling 
operator such that ( 14.70) is satisfied. 

There exist many choices for g(t); for example, fix the support of <p(t) to 
be [—1/2, 1/2] so that (4.70) is satisfied for \n — k\ > I solely because functions 
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do not overlap. Next, make !p(t) an even function because, combined with the 
fact that (pit — 1) is a time-reversed version of <p(t + 1), it will yield the same 
constraint for n — k = ±1. Many possible parametrization of ip(t) on [0, 1/2] 
with two parameters will allow us to determine those parameters uniquely by 
enforcing ( 14.70) for n — k = 0, 1. Assume thus tp to be of the following form: 

= J <b-\t\), |*| < 1/2; 
[_ 0, otherwise, 

We thus get the following system of equations: 

,1/2 

(<p(t),$(t)) = / {1 - t) a(b - t) dt = 1, 
J -1/2 

(tp(t), p(t- 1)) = / (1 - t) a(b - t) dt = 0, 

Jl/2 

with the solution a = 3, b = 2/3, or 

r \ 0, otherwise. 

Functions tp(t) from ( 14.69) and <p(t) from ( 14.71) are plotted in Figure 14.31( a) 
and (b). With their integer shifts, these functions span different spaces; that is 
S 7^ S. By inspection of tp and Ip, we can see that functions in S and S are 
piecewise linear. The functions in S are continuous and may have a change of 
derivative at the integers. The functions in S are generally discontinuous at the 
integers and will have changes of derivative at all odd multiples of 1/2. In fact, 
functions in S always look like saw blades; Figure 14.31( d) gives an example for 
x(t) = <p(t + 1) + !p(t) + ip(t - 1). 



Sampling Followed by Interpolation Sampling followed by interpolation is de- 
scribed by P = $$*, as in Figure [4.30( b). When the sampling and interpolation 
operators are consistent as in ( 14.70) , then P is idempotent, meaning it is a pro- 
jection operator. It projects onto S, but the projection is not orthogonal. The 
approximation error x(t) — x(t) is orthogonal to S but not to S (recall Figure Rl. 11 
for a conceptual picture). 

Again, for P to be self-adjoint as well, $ must be chosen to be the pseu- 
doinverse of $*, ( 14.18) : the sampling and interpolation operators are then ideally 
matched, and subspaces S and S are identical. 

The previous discussion can be summarized as follows: 

Theorem 4.18 (Recovery for functions) Given is the system as in Fig- 
ure 14.30( b) with sampling operator $* from ( 14.68) and interpolation operator 
$ from ( 1444) . Then, with S = K(®) and S = A/'($*)- L : 



a.3.0 [October 2011] CC by-nc-nd Comments to book-errata@FouricrAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



4.4. Functions 



425 




Figure 4.31: 



(c) £(t) (d) x(t) e S 

Sampling and interpolation for functions, (a) Interpolation postfilter ip{t) 



from ( 14.691) . (b) Sampling prefilter ip(t) from ( 14.711) resulting in projection, (c) Sampling 
prefilter ip(t) from ( 14.72 j) with coefficients as in ( |4.76D resulting in orthogonal projection, 
(d) A function x(t) — ip(t + 1) + <f>(t) + <p{t — 1) from S using ip(t) from (b). 



(i) When P = $$* is idempotent, that is, sampling and interpolation operators 
are consistent, then P is a projection operator. When x(t) € S, x(t) is 
perfectly recovered. 

(ii) When P = $$* is idempotent and self-adjoint, that is, sampling and interpo- 
lation operators are consistent and ideally matched, then P is an orthogonal 
projection operator. When x(t) e 5, x(t) is perfectly recovered. 



Example 4.23 (Sampling followed by interpolation in £ 2 (R)) ForP = 
<&$* to be an orthogonal projection operator, apart from consistency, the opera- 
tors must be ideally matched, that is, S = S. For this we cannot make arbitrary 
choices as in Example 14.221 in fact, !p(t) is then uniquely determined. 

The key to finding f> such that S = S is that if tp £ S, then integer shifts 
of tp generate the same space S. Thus, let 



$(t) 



J2a k ip(t 

fcez 



kT) 



(4.72) 
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for some sequence a/j to be determined. We do this as follows: 

(a) 

Wit " '" ), 

( = } 



(<p(t-nT),!p(t-kT)) = (<p(t-nT),^2at<p(t-kT-£T)) 

Y^ ai (<p(t - nT), tp(t -kT- IT)) 



} j Oik On-k-t = 8n-k, (4.73) 



fc£2 



where (a) follows from (4.72) ; (b) from the linearity of the inner product; in (c) 
we defined an autocorrelation sequence 

a m = (ip(t), tp(t + m)), (4.74) 

the autocorrelation of tp(t) evaluated at the integers; and (d) follows from our 
assumption that the operators must be consistent as in ( j4.70[ ); 

To find the sequence a*, we recognize (4.73) as a convolution between two 
sequences a n and a n . In z-transform domain, we can rephrase that as 

a(z)A(z) = 1. (4.75) 

For tp(t) from (4691 ), A(z) = (z + 4 + _1 )/6, and thus 

16 6c 



a(z) 



A(z) z- 1 + A + z (1 + cz- x )(l + cz) 
6c / 1 1 



1 — c 2 \ 1 + cz l 1 + cz / 
with c = 2 — v3- This rational z-transform corresponds to the two-sided sequence 

a k = -^(-c)l fc l, (4.76) 

1 — c A 

from which ip(t) follows according to ( 14.72) . Figure 14.31( c) shows this <p(t). 

We will see in the next chapter how this example can be generalized by 
defining S^ k ' as the space of functions that are piecewise polynomial of degree 
k with k — 1 continuous derivatives at the integer breakpoints. A biorthogonal 
basis for S^ k ' is given by B-splines of order k, denoted by ft '(t), where k gives 
the number of convolutions of the unit box function with itself. 

4.5 Periodic Functions 

In Chapter \3\ we studied two classes of functions, those on the real line in Sec- 
tion 13.2.11 and periodic functions in Section 13.2.21 We now consider sampling and 
interpolation of T-periodic functions, ( 13.18) , with a square- integrable period and 
Fourier series coefficients X k as in ( 13. 90a) . We call the space of such functions 
C 2 ([— T/2, T/2)); as we have seen in Chapter \3\ operations in such spaces can be 
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thought of either as operations on periodic functions defined on K with a square- 
integrable period, or operations modulo T on functions defined on C 2 ([— T/2, T/2)). 
We will go over a subset of topics we covered for functions on the real line in the 
last section, those that are of importance in practice. 

Shift-Invariant Subspaces of Periodic Functions As before, we start by introduc- 
ing shift-invariant subspaces of £ 2 ([— T/2, T/2)). 

Definition 4.19 (Shift-invariant subspaces of £ 2 ([-T/2,T/2))) A sub- 
space W C £ 2 ([— T/2, T/2)) is a shift-invariant subspace with respect to shift 
t G [—T/2, T/2), with r integer divisor of T, when x(t) G W implies x(t—kr) G W 
for every integer k. In addition, w G C 2 ([— T/2, T/2)) is called a generator of W 
when W = span({w(£ — fcr)}fe 6 z). 



Subspaces of Bandlimited Periodic Functions A special case of shift-invariant 
subspaces of particular importance in signal processing is the subspace of bandlim- 
ited periodic functions; we now define it formally and later we look at forming 
approximations in these subspaces through sampling and interpolation. 

Definition 4.20 (Bandwidth for periodic functions) A periodic function 
x(t) G C 2 ([— T/2, T/2)) is said to have bandwidth ko for the smallest odd ko G Z + 
such that the Fourier series Xk satisfies 

X k = for all |fc| > h^l, (4.77) 



Note that fco is odd because for a real function, Xk = X^ k , see Table 13741 

Definition 4.21 (Subspace of bandlimited periodic functions) A sub- 
space BL{-(fc - l)/2, . . . , (fc - l)/2} C £ 2 ([-T/2,T/2)) is a subspace of ban- 
dlimited periodic functions when all x(t) G BL{ — (kg — l)/2, . . . , (fco — l)/2} have 
bandwidth at most fco- 



A subspace of bandlimited periodic functions is shift invariant for any shift r G 
[—T/2, T/2); as before, subspaces of bandlimited periodic functions are the only 
ones that are simultaneously shift invariant for all shifts r G [—T/2, T/2). To see 
shift invariance, take x(t) G BL{-(fc - l)/2, . . ., (k - l)/2}. Then, from (3.102) 

x {t-nT) ^ e-^/ T ^ knT X k ; 

the Fourier transform is multiplied by a complex exponential, not changing the 
bandwidth of the shifted function. 
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(a) Sampling. 



(b) Interpolation. 



Figure 4.32: Sampling and interpolation in £ 2 ([— T/2, T/2)) with orthonormal periodic 
functions. 



4.5.1 Sampling and Interpolation with Orthonormal Periodic 
Functions 

We now repeat the sequence of steps in examining the operations of sampling and 
interpolation on periodic functions. 



Sampling We refer to the operation depicted in Figure [4.32( a), involving circular 
convolution with g(—t) and sampling at t = nT s , with T s integer divisor of T, as 
sampling of the function x(t) € £ 2 ([— T/2, T/2)) with prefilter g(—t), and denote it 
by Vn = (®*x) n . Calling the number of sampling points k s = T/T s = 2kh + l j I 
through this operation, we move from the larger space, C 2 ([— T/2, T/2)), into the 
smaller one, C ks . 

The output of sampling is 



(a) 



(-t) ® x(t)\ t=nTa 

T/2 



T/2 



T/2 



x(t) g(t — nT s ) dt 



x{t) <p(t — nT s ) dt 

T/2 

(x(t),ip(t-nT s )) t = ($*a;) 



ip(-t)®x(t)\ 



t=nT, 



(4.78) 



where (a) follows from ip(t) = git). We assume {<p(t — nT s )yj!_, to be an or- 
thonormal set, 



(ip(t - nT s ), tp(t - £T S )) = S n ^i 



■^> 



$*$ 



(4.79) 



As before, the sampling operator has a nontrivial null space, S 1 - = Af($>*); the set 
{(fi(t — nT s )} n h = _ k spans its orthogonal complement, S = Af(^*)' L = spa.n({ip(t — 
nT s )} n h = _ k ). In other words, when a function x(t) G £ 2 ([— T/2, T/2)) is sampled, 
the component that remains is in S and is captured by <&*x; the component that is 
lost due to sampling is in the null space S . 



Interpolation We refer to the operation depicted in Figure [4.32( b), involving 
pointwise multiplication with a Dirac delta comb function st 3 (t) , ( 13. 7[ ) , and cir- 



84 We will assume k s to be odd to be consistent with the notion of bandwidth introduced earlier. 
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(a) Interpolation followed by sampling. 




(b) Sampling followed by interpolation. 

Figure 4.33: Sampling and interpolation in £ 2 ([— T/2, T/2)) with orthonormal periodic 
functions. 



cular convolution with g(t), as interpolation of the sequence y n € C fcs with postfilter 
g(t), and denote it by x(t) = ($?/)(£). 
The output of interpolation is 



x(t) = Y y n g(t-nT s 

I 

(a) 



(yn, g(t - nT s )) r 



-—kh. 



(y n , <p(t - nT s )) n = (*y)(t), 



(4.80) 



where (a) follows from ip(t) = g(t). Denoting the range of $ by S as before, this 
subspace is, of course, the same as the orthogonal complement of the null space of 
the sampling operator, as we have seen earlier. It is also a shift-invariant subspace 
with respect to integer multiples of T s ; a shift of x by nT s is still in S. The way we 
chose pre- and postfilters, the sampling and interpolation operators are adjoints of 
each other; the proof mimics ( 14.451 ) and is left for Exercise 14.161 

Interpolation Followed by Sampling Interpolation followed by sampling is de- 
scribed by $*$ as in Figure 14.33( a), 



($*$y) r 



(a) 



(b) 



(c) 



** Y^ y k g{t-kT s ) 



n=—k h 
T/2 / k h 



-T/2 \ r 
k h 

Y, y* 



Y V k 9^ ~ kTs ) J 9(t - nT s ) dt 

=-k h J 

T/2 



- — Kh 



-T/2 



g(t - kT s )g(t - nT s ) dt = 2M, 



where (a) follows from the expression for the interpolation operator, ( 14.80} ); (b) from 
the expression for the sampling operator, (4.78J ); in (c) we interchanged summation 
and integration; and (d) follows from our assumption, ( 14.79) , that {<p(t— kT s )} k 'L_ k 
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Figure 4.34: Figure 14.33( b) between sampling prefilter and interpolation postfilter: 
Sampling of the periodic function y(t) results in a sampled function y s (t). 



is an orthonormal set. Thus, y n is perfectly recovered. Equation ( 14.79) also shows 
that the condition for perfect recovery is the same as the set of functions {<p(t — 
kT s )} k ^__ k being orthonormal, as in (1.83) . This set of functions is not a basis for 
C 2 ([— T/2, T/2)); instead, it is an orthonormal basis for the subspace S it spans. 

Sampling Followed by Interpolation Sampling followed by interpolation is de- 
scribed by P = $$* as in Figure 14.33( b) . As for functions, and given our choice 
of sampling and interpolation operators, P is idempotent and self-adjoint, that is, 
P is an orthogonal projection operator. Then, by Theorem 11.261 (Px)(t) is the 
best least-squares approximation of x(t) in 5; if x{t) G S, sampling followed by 
interpolation will perfectly recover x{t): 

Theorem 4.22 (Recovery for periodic functions) Given is the system as 
in Figure 14.33( b) with sampling operator $* from ( |4.78| ) and interpolation oper- 
ator $ satisfying (4779] ). Then, with S = K($), 

x(t) = ($<b*x)(t) (4.81) 

is the best least-squares approximation of x(t) in S, that is, 

x(t) = min \\x(t)-x s (t)\\ 2 , x(t) - x(t) ± S. 

x g (t)es 

When x(t) £ S, then x{t) = x(t). 



4.5.2 Sampling and Interpolation for Bandlimited Periodic 
Functions 

Since a subspace of bandlimited functions is also a shift-invariant subspace, every- 
thing we have seen thus far holds here as well. In particular, if x(t) G BL{ — (ko — 
l)/2, . . . , (ko — l)/2}, sampling operator $* is given by ( 14. 78) and interpolation 
operator is $, then by Theorem 14.221 x (t) is perfectly recovered after sampling 
followed by interpolation. For P to orthogonally project onto that bandlimited 
subspace, we again need the projection operator P = $$* to be a bit more specific. 
We now establish what they must be using the machinery from Chapter [3j 

As we did for functions, we first discuss the sequence of operations between the 
sampling prefilter and interpolation postfilter in Figure 14.33( b) , depicted separately 
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Figure 4.35: Illustration of sampling from Figure 14.341 (a) A bandlimited periodic 
function y(t) & BL{ — 2, . . . , 2} with period T — 1. (b) Its bandlimited Fourier series Yfc. 
(c) The sampled version of y(t) with T a — T/5 — 1/5, or, k a — 5. (d) Its periodic Fourier 
series Y sk . 



in Figure Ri.341 periodic function y(i) multiplied by the Dirac delta comb function 
8T a (t), ( I3.TJ ) , produces a sampled function y a {t). An example function ?/(£), its 
sampled version y s (t) and their respective Fourier series are depicted in Figure 14.351 
The Dirac delta comb Fourier series pair is (the proof is left for Exercise 14.171 ) 



s tM = 5>(*-nT s ) = E E 8(t - tT - nT a ), (4.82a) 

n£Z eeZn=-k h 

Sr s ,k = ^]T> fe _ M , (4.82b) 

where we represented the periodic function ST a (t) as the sum over individual periods. 
The sampled function y s (t) can now be compactly represented as 



V.(t) = y(t)s T ,(t) = y(t)Y E S(t-£T-nT s ) 

t^L n=—kh 

- E E y(eT + nT s )S(t-eT-nT s ) 

££Z n=— kh 

= E E v(nT s )S(t-£T-nT s ), 



(4.83) 
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where (a) follows from the sampling property of the Dirac delta function in Table [371] 
and (b) from the periodicity of y(t). Let us now find the Fourier series of y s (t) as 
well as the DFT of y n : 



y s (t) £** Y sM t»i/^)] y{nT s )5{t-lT-nT s )e-^' T ^dt 

1 J -T/2 l£Zn= _ kh 

= \ E E I*'* y{nT s )5{t-lT-nT s )e-^' T ^dt 

n=-k h eez T / 2 

( i i £ y(nT s ) ^ 5{r-nT a )e-^ T ^dT 

n=-k h J -°° 



n= — k h 

^ i ^ j/(nr s ) e -^ 2 ^) fc «, (4.84a) 

n=—kh 

y „ 2^ y d>fc £> £ y(TlTa)e - J -(W*.) fc n ) (4 . 84b) 

where (a) follows from the definition of the Fourier series, (3.90b) ; in (b) we 
exchanged the order of summation and integration; in (c) we changed variable 
t = t — IT and concatenated integrals over periods of length T into a single integral 
over the real line; (d) follows from the sifting property of the Dirac delta function 
in Table ELB (e) from T/T s = k s ; and (f) from the definition of the DFT, ( 12.159a[ ). 
From this, we see that the Fourier series of the sampled function and the DFT of 
the sequence of samples are related by 

Yd,k = ^Y S;k , (4.85) 

that is, they are the same modulo scaling by T . 

More often, we will use the following alternative version of Y Si f., as it allows 
easier analysis of aliasing: 

Y s>k ( => S Ts ,k*Y k ( => ^4-w*^ = ^£) y *-*-* (4-86) 

s £gz s eel. 

where (a) follows from the convolution in frequency, ( 13.108) ; (b) from (4.82a) ; and 
(c) from the shifting property of the Dirac delta function (see Table 13.1) . This 
expression is the counterpart to ( 14.51) ; namely, the Fourier series of the sampled 
periodic function, Y Si k, consists of shifted replicas, Yk-k a e (see Figure 14.35( d)), by 
integer multiples of k s , of the original spectrum Y k ; as before, this will lead to a 
discussion of aliasing, to appear shortly. 
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Time domain 



Fourier domain 



Functions 




y(t) 

y n = y(nT) 




FT 
DTFT 


Y(u) 
Y(et» 


= E„ ez v(nT) e-**» 


W.(t)=E„ 6 *J/(»»T)*(*- 


-nT) 


FT 


y.(w) 


= J2r^zV^T)e-^ T 



Periodic functions 



Y&») = Y s (f ) = i E feez ^ (f " ¥ fc ) 
St(w) 



-^EfegzH" 



m 




y(t) 

y n =y(nT a ) 

Vs{t) = j: n€Z y(nT ll )S{t-nT 3 ) 



FS 
DFT 

FS 



Y, 



d,k 



J2 k n t_ k v(nT s )e 



= -k 
h 

"s,fc = t~ E^ez *k-k s e 



-j(2n/k s )kn 



Ys,k = |EtU V(»r.) e-^ 2 »/ fc .)*" 



st. (t) = E <e z E"'l_ feh Ht -£T-nT s ) ^ 



Y d , k = 
Sr.,k 



f y s ,k — jr- Z-iteZ r k-k s £ 






ik-k,t 



Table 4.1: Summary of sampling relationships. For functions: function y(t) and its 
Fourier transform Y(lo), the discrete sampled version y n and its DTFT Y{e ]u} ), and the 
continuous sampled version y s (t) and its Fourier transform Y 3 (uj). For periodic functions: 
periodic function y(t) and its Fourier series Yk, the discrete sampled version y n and its 
DFT Yd,*, and the continuous periodic sampled version y s (i) and its Fourier series Ys,fc- 



Equation 14.861 together with ( J4.85D leads to the expression connecting the 
DFT of a sequence of samples y n with the Fourier series of its underlying continuous- 
time periodic function y(t): 



Y d , k 



1 



Y, 



s.k 



■ y jYk—k.h 



(4.87) 



Table Rhll summarizes these relationships for both functions and periodic functions. 
Assume now that x(t) G BL{ — (fcg — l)/2, . . . , (ko — l)/2} and that the sam- 
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pling prefilter in Figure [4. 33( b) is a simple multiplicative factor of y/T\/Tr 5 \ so 

y(t) = ^x(t), (4.88) 



leading to 



Y Sik ( => -^=£)* fc _ M , (4.89) 



IT, 



where (a) follows from ( 14.86J ) . For some LSI postfilter g{t) to recover x(t), a 
coefficient- wise multiplication of ( 14.89) by G k must yield X k . We want the re- 
covery to work for every x(t) G BL{ — (fcrj — l)/2, . . ., (feo — l)/2}, so we cannot 
count on any property of X k other than bandlimitedness (4. 77) . Thus, multiplica- 
tion by Gk must compensate for the Tj \JT S factor in the £ = term of (4.89) and 
zero out all the other terms. 

Whether the multiplication by G k will recover X k depends on whether the 
spectral replicas in ( 14.86) overlap. We now discuss two possibilities in more detail. 

Aliasing When k$ > k s , spectral replicas overlap, and no LSI filtering will succeed 
in recovering x(t) for every x(t) G BL{ — (kg — l)/2, . . . , (fcrj — l)/2}. As we have 
seen, this confusion of frequencies is called aliasing. 

Sampling Theorem When kg < k s , spectral replicas do not overlap, and obtaining 
x = x requires G k to be an ideal filter with cut-off frequency (ferj — l)/2. Choosing 
exactly ko = k s = T/T s uniquely determines the postfilter as 

f y/Ts/T, \k\<k h ; fs m 1 sinc(jM) 

Gk ~ { 0, otherwise, *~* d{t) ~ VT S ^4^)- (4 ' 90) 

The proof that above indeed form a Fourier-series transform pair is left for Solved 
Exercise 14.51 This function is the Dirichlet kernel. 

Since {g(t — nT s )} k '^_ k are orthonormal, we can choose the sampling prefilter 
to be g(— i)] 86 ] leading to an orthogonal projection operator P. Because x(t) G 
BL{-fc/i, . . . , kh}, this sampling prefilter has no effect on x{t). By construction, the 
orthonormal set {g(t — nT s )},^__ k is an orthonormal basis for BL{-fc/,, . . . , kh}. 

The frequency k s = T/T s is the Nyquist frequency for periodic functions; we 
see that it equals twice the maximum frequency kh of the spectrum of the input 
function. 

This discussion leads us to the sampling theorem for periodic functions: 



85 As for functions, this means that the sampling prefilter is not present; multiplication by \/T\/T 
is purely for convenience. 

86 We choose a time-reversed version to yield an orthogonal projection operator P. This time 
reversal has no effect on the ideal filter since its impulse response is symmetric. 
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\ t 


-lK /s 


5\ yfo 



(b) sinc(~ t) 




(c) smc{jrt) I 'sinc(|ri) 



Figure 4.36: The interpolating function from ( ]4.91[) with T — 5 and T a — 1. 



Theorem 4.23 (Sampling theorem for periodic functions) Given is the 
system as in Figure 14.33( b) with interpolation postfilter g(t) from ( 14.90) . Then, 



x(t) 



t-^ sinc( tS- (t — nT s )) 

I>(" T s)— 77^-^nT «* x(t)<=EL{-k h ,...,k h }. (4.91) 

raSZ 



sinc(S;(£ — nT s )) 



The interpolating Dirichlet kernel does exactly as expected: at a given sampling 
point t = nT Sl both numerator and denominator sine functions equal to 1, while 
the contribution from the other sine functions is because the numerator sine is 
while the denominator sine is 1 (see Figure 14.361 ). 

Bandlimited Approximation of Periodic Functions We now assume that x{t) ^ 
BL{ — kh, ■ ■ • , kh}- Then, as a corollary to Theorem 14.221 since P is an orthogonal 
projection operator and S = BL{— kh, • • ■ , kh}: 



Theorem 4.24 


(Best 


least-squares 


bandlimited approximation) 


Given 


is the system as 


in Figure 4.33fb) with 


nterpolation postfilter 


g(t) 


from 


(14.901). 


Then, 




x(t) = ( 


£$*ar)(t) 






(4.92) 


is the best least- 


squares 


approximation of x(t) in BL{-fc/,, . . . , 


kh}, 


that 


is, 




x(t) = 


= min 

x B L(t)eBL{-k h ,. 


\\x(t)-x Bh (t)\\ 2 , 










2(f)- 


x(t) _L BL{-k h 


, ..., k h }. 









The effect of this approximation in the Fourier domain is a simple truncation of the 
spectrum of x(t) to {—kh, • ■ ■ , kh}'- 



Xi 



X k , \k\ < k h ; 
0, otherwise. 



(4.93) 
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(a) x(t) 



(b) y(t) = x(t) 



(c) y n 



Figure 4.37: Sampling the triangle wave, (a) The original periodic function x(t). (b) 
Its bandlimited version y(t) 6 BL{ — 2, . . . , 2}. (c) The sampled version y n with T s = 0.2. 
Its interpolated version is the best LS approximation x(t), and equals y(t) from (b). 



Example 4.24 (Sampling the triangle wave) Consider x (£) of period 1 with 
one period as in (3.114) , shown in Figure [4. 37( a), with Fourier series coefficients 
as in (3.115) ; we repeat both here for completeness 



x(t) = { § — |*|, |t|<§. 



FS 



1/4, k = 0; 
Xk — { 0, fceven, k ^ 0; 

l/(7rfc) 2 , fcodd. 

(4.94) 
This function is clearly not bandlimited; by Theorem 14.241 the best least- 
squares approximation of x(t) in BL{ — kh, ■ ■ ■ , kh} is 



x(t) 



(o) 



kt 



(c) 1 

4 



feez 

2m+l<k h 

E 



(6) 



kh 
fe=— k h 



kl 



(7r(2m+l)) : 



■cos(27r(2m + l)i), 



(4.95) 



where (a) follows from the definition of the inverse Fourier series, ( 13. 90b) ; (b) 
from (4.93) ; and (c) from (4.94) and the summation goes only over odd indices 
k = 2m + 1. The bandlimited function y(t) is illustrated in Figure Rl. 37( b), while 
its samples are given in Figure [A. 37( c). The best LS approximation in this case 
is x(t) =y(t). 

If we sampled without lowpass filtering with g(—t) first, the samples would 
clearly not be the same as those in Figure [4. 37( c). 

4.6 Stochastic Vectors and Processes 

When signals are realizations of stochastic processes, be it in discrete or in continu- 
ous time, it is possible to derive sampling theorems similar to the ones we have seen 
in the deterministic case. We will focus on the bandlimited case, both because of its 
practical importance and its parallelism to the deterministic setting we saw earlier. 
In our treatment, we assume WSS processes having a well-defined power spectrum. 
Furthermore, we require the power spectral density to be continuous, meaning the 
autocorrelation is in I 1 or C . 
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4.6.1 Finite-Dimensional Stochastic Vectors 

Since the sampling and interpolation operators are linear, the effect of additive noise 
can be analyzed separately from any deterministic vector of interest. We start with 
vectors and consider the simple case of a white noise vector as in Section 12.8.11 an 
M -dimensional random vector x = [xo Xi ... Xjvf-il , whose mean is zero and 
autocorrelation matrix is A x = E[xx* ] = a 2 1. How do sampling and interpolation 
influence the noise characteristics? It is easy to see that the zero-mean property is 
conserved, what about the correlation properties? 
The output of sampling is y = <&*x, and thus 

A y = E[yy*] = e[$*xx*$] = $*E[xx*]$ = 5*A X $ = ct 2 $*$. 

Thus, the samples of y are only uncorrelated when the rows of $* are orthogonal; 
they additionally have equal variance if the rows of $* are orthonormal. 
The output of interpolation, or x = $x, and thus 

A x = E[xx*] = E[$xx*$*] = $E[xx*]$* = $M X $ = a 2 $$* . 

Since $$* is not of full rank (size M x M but rank N < M), x cannot be uncorre- 
lated; no surprise since x belongs to an iV-dimensional subspace S. 

A similar analysis can be done for the cascade of sampling and interpolation, 
in either order, showing that perfect reconstruction leads to uncorrelated outputs 
(see Solved Exercise 14. 6| ) . 

4.6.2 Discrete Bandlimited Stochastic Processes 

As we said, we assume we are dealing with WSS processes x„, that is, those pro- 
cesses whose mean is constant and autocorrelation depends only on the lag n, as 
in ( 12.224) . Since we cannot take a DTFT of a discrete stochastic process, as it is 
neither absolutely, nor square summable, we make assessments based on averages 
(moments), such as, taking the DTFT of the autocorrelation, or the power spectral 
density A x (e JUJ ). We will assume this power spectral density to be continuous here. 
To do a sampling and interpolation analysis, we use the results from Sec- 
tion 2.81 such as the power spectral density of the output after downsampling, 
( 12.252J ) , as well as the power spectral density of the output after upsampling, (2.2531 ). 
We could now follow what we saw for vectors as well as in Section 14.31 We instead 
concentrate on the sampling theorem for discrete bandlimited stochastic processes, 
following on Theorem 14.71 

Theorem 4.25 (Sampling theorem for discrete stochastic processes) 
Given is the system as in Figure 14.14( b) with x„ a discrete WSS process and 
interpolation postfilter g n from ( 14.32) . Then, 

L 
x„ = lim V x fe sinc( — (n-kN)) O a x ,„ G BL[-— , — ]. (4.96) 

L^oa * — ' iV JV JV 

k=-L 
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Figure 4.38: An AR-1 system followed by projection onto BL[— 7r/2,7r/2]. 



It is also clear that filtering with an ideal lowpass filter with impulse response 
sinc(7rn/7V) and then downsampling by N before interpolating like in ( I4.96J ) com- 
putes the best orthogonal projection of the input onto BL[— ir/N, tt/N]. 

Example 4.25 (Bandlimiting an AR-1 process) Consider the system in Fig- 
ure |4.38| with an i.i.d. WSS input with variance 1, w n . We obtain an AR-1 process 
x n by filtering with a recursive filter as in (2.229) . This stochastic process then 
goes through a sequence of sampling and interpolation with both the prefilter as 
well as the postfilter being ideal halfband lowpass filters. This multirate system 
computes the projection of x„ onto BL[— n/2, 7r/2], as we now demonstrate. 

The power spectral density of x is A K (e JUJ ) as computed in ( |2.234f ) (see 
Figure 331 (a)), 



A x (eP u ) 



1 



|1 -ae-^p 



2acosaj ' 



lol < 1. 



This power spectral density is then bandlimited to 
halfband filter with gain \/2 (see Figure 3.39( b)), 

/ i /f(l-a 2 ) | W |<7T/2; 

! 0, otherwise. 



r/2, 7r, 2] with an ideal 



This is followed by downsampling by 2. From ( 12.2521 ) with N = 2, we get (see 

I -a 2 



Figure 3391(c)), 



M^) 



V2(l + a 2 - 2acos(w/2))' 

Even though cos(oj/2) is Air periodic, A y (e 3U ) is 27r periodic. After upsampling 
by 2, from (2.253[ ) we get (see Figure 3321(d)), 

I -a 2 



v2 (1 + a 2 — 2acosLo) 



and finally, ideal lowpass filtering produces the bandlimited version of x (see 
Figure 3391(e)), 



M^" 



1+a 2 —2a cos c 

o, 



|w| < tt/2; 
otherwise. 
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(d) Output of upsampling. (e) Output of postfiltering. 



Figure 4.39: Bandlimiting an AR-1 process as in Figure 14.381 (a) The AR-1 pro- 
cess, followed by (b) prefiltering, (c) downsampling by 2, (d) upsampling by 2, and (e) 
postfiltering. (c = (1 + a)/(l — a).) 



We now verify that x is the best least-squares approximation of x onto 
BL[— 7r/2, 7r/2]. From the projection theorem, Theorem 1 1.261 one way of showing 
this is proving that the approximation error x— x is orthogonal to BL[— 7r/2, 7r/2]. 

From Table 12.101 we know that if x is WSS, then the result of filtering 
followed by downsampling will be WSS as well. Then, that WSS input into 
upsampling followed by filtering produces a WSCS2 output. We have defined 
orthogonality for WSS processes in Definition 2.171 and are allowed to compute 
a power spectral density for WSS processes only; we thus need to prove orthog- 
onality separately for individual polyphase components of x, each of which is 
WSS. 
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Take the first polyphase component, xq, 

F f ( — ~ ^ <? 1 (— ) _ _ ^ 

^-4. V 2n ^2nJ^2n— m] — Qjc,x,2n,m ^x,2n,m — Cx,Xo,m ^Xo,m? 

where (a) follows from ( J2.223J ) and the fact that x is real; and (b) from xq, being 
WSS. 

TBD: To be finished. 

4.6.3 Continuous Bandlimited Stochastic Processes 

As for discrete stochastic processes, we assume we are dealing with WSS processes 
x(t), that is, those continuous processes whose mean is constant and autocorrelation 
depends only on the lag t, as in ( 13.1191 ). Since we cannot take a Fourier transform 
of a continuous stochastic process, as it is neither absolutely, nor square integrable, 
we make assessments based on averages (moments), such as, taking the Fourier 
transform of the autocorrelation, or the power spectral density A x (u>). We will 
assume this power spectral density to be continuous here. 

We could now follow what we saw for functions in Section [4.41 We instead con- 
centrate on the sampling theorem for continuous bandlimited stochastic processes, 
following on Theorem 14.141 The power spectral density of y n = x(nT) is given by 

\A y {en\ 2 = ^A x (^- y W G[-7T,7r]. (4.97) 



Theorem 4.26 (Sampling theorem for continuous stochastic processes) 
Given is the system as in Figure 14.4( b) with x(£) a continuous WSS process and 
interpolation postfilter g(t) from (4.57) . Then, 

L 
x(t) = lim J2 <nT)smc(^(t-nT)) O a x (t) € Bh[~, £]. (4.98) 

k=-L 



The proof of the theorem is somewhat technical; we omit it here and give pointers 
in Further Reading. 

4.7 Computational Aspects 

4.7.1 Projection Onto Convex Sets 

Given x, the closest point to it on a subspace S is unique and given by the or- 
thogonal projection x. More generally, given 1 in a Hilbert space and a convex 
subset S, the closest point to x on S is called x and is unique. This facts allows 
us to find points belonging to the intersection of convex sets by an iterative algo- 
rithm called projection onto convex sets (POCS). Intuitively, instead of trying to 
satisfy all membership constraints at once, one satisfies one constraint at a time; 
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(a) Unique solution. 



(b) Set of solutions 



(c) Set of solutions. 



Figure 4.40: Iterative solution of convex constraints using POCS. By iteratively 
projecting to the closest point, a solution belonging to the intersection is found, (a) The 
convex sets are subspaces; the solution is unique, (b) Intersection of a general convex set 
and a subspace; there is a set of possible solutions, (c) Intersection of two general convex 
sets; there is a set of possible solutions. 



BL{-(fco-l)/2,...,(fco-l)/2} 



x(t) = x(t) 
te(a,(3) 




Figure 4.41: Illustration of the Papoulis-Gerchberg algorithm on a bandlimited Fourier 
series of periodic functions with period 1 and partial observation. 



because of convexity of the sets, the procedure is guaranteed to converge to one 
point (not necessarily unique) that belongs to the intersection of all convex sets 
(see Figure [4~40]) . 

POCS-type algorithms are often used in problems involving bandlimited sig- 
nals that form a subspace, and have to satisfy some other convex constraint. They 
are also used because they are simple, and allow to deal with large-size problems. 
We now discuss a representative example. 

Papoulis-Gerchberg Algorithm We discuss two instances. We first look into ban- 
dlimited Fourier series with partial observation. Let x(t) g BL{ — (fcrj - 1)/2, ■•■, (fco - 
l)/2} be a periodic function of period 1. Let this function be only observed over a 
subinterval of [1/2, 1/2), for example, [a,/?), with a,/3 g (1/2, 1/2). 

The Papoulis-Gerchberg algorithm alternates between enforcing two convex 
constraints: 
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0.2 0.4 0.6 



(a) Input spectrum. (b) Result after 20 iterations. 



200 400 600 800 1000 

(c) Residual error. 



Figure 4.42: Papoulis-Gerchberg algorithm on a bandlimited Fourier series of an ex- 
ample periodic function with period 1 and partial observation. 



(i) x(t) € BL{ — (fcrj — l)/2, . . . , (ko — l)/2} by truncating the Fourier series as in 

( jjg) ; and 
(ii) x(t) = x(t) for t G (a, /3); for the missing values, leave x(t) as is. 



Figure |4.41[ ) illustrates the procedure. An example of x{t) with a bandlimited 
Fourier series where only an interval is actually observed is shown in Figure 14.421 
Note the slow convergence at the boundary of the observation interval in Fig- 
ure aaKb). 

The same algorithm can used for images, where a part of the image is missing 
or corrupted. This is also called the inpainting problem. 

Let x ni ,„ 2 , be an N x iV-size image, and assume its DFT X^ ,fc 2 is bandlimited 
to fco,i x &)o,2 lowpass coefficients. Assume a region of the image is missing, and this 
region has M pixels, where M < N — kg ^k^ 2 . Because the number of missing DFT 
coefficients N 2 — k 2 x k 2 2 is smaller than the number of measurements iV 2 — M , the 
solution is uniquely specified (but can be ill-conditioned). The Papoulis-Gerchberg 
algorithm alternates between enforcing two convex constraints as before: 



(i) #ni,n 2 bandlimited by truncating the DFT; and 



(ii) 2?ni, 



for those m, n<i for which i„ 



is known; for the missing 



values, leave x ni ,n 2 as is. 



Geometrically, the situation is again as in Figure 14.411 We show an example in 
Figure 14.431 time bandlimited to 1/2 of the original bandwidth, is shown in Fig- 
ure 14.43( a), the same image with missing stripes of size 1/10N 2 Figure 14.43( b), 
and the reconstruction in Figure 14.43( c) (after a few iterations, since ultimately, 
the reconstruction will be perfect). In this case, the Papoulis-Gerchberg algorithm 
looks like magic, since it recovers the missing woman. 

The Papoulis-Gerchberg algorithm will converge to a unique solution only 
when specific conditions are satisfied. For example, the number of unknowns (the 
missing part in the time domain of the signal) should be equal to or less than 
the number of equations (known frequency part of the signal). Let x G R w be a 
discrete-time signal with the DFT X 6 C N , and assume that x is only partially 
observed. 
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(b) Image bandlimited to half bandwidth. 




(c) Image with partial observation. 



(d) Reconstructed image. 



Figure 4.43: Papoulis-Gerchberg algorithm on a bandlimited DFT of an example image 
with partial observation. 



Partition x and X as 



X 



Xq 

Xi 

x 2 



with Xq of dimension s, X\ of dimension q and X2 of dimension N — s — q, as well as 
Xq of dimension N/2 — k, X\ of dimension 2fc + 1 and X2 of dimension N /2 — k — 1. 
With the above partitioning, the X can be written as 



(1 




Foo 


Fqi 


F02 




Xq 




= 


F10 


Fu 


F\2 




X\ 


2 




F20 


F21 


F22 




X2 



The discrete version of the Papoulis-Gerchberg algorithm solves the following prob- 
lem: assuming xq and X2 are fixed and Xq and X2 are known, find X\ and X\. Note 
that for the algorithm to converge to a unique solution we need q < N — Ik — 1. 



('<) 



;(n) 



Call the the missing parts at iteration n, x\ and X\ . Then next iteration 
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will produce 



X(™ +1 ) = [F w F n F u ] 



x 



x 2 



— AT *■ 01 U 21 -' 



X,_ 



(n) 



Call Q = FuF^/N and P = F^F n /N; then, the updates are 



X 



(n+1) 

1 

-(n+1) 



(I - Q)Xi + QX| 



(») 



(J - P)ari + Px\ 



(n) 



where we have used that -Fio^o + -Fi2£2 = X\ — -fii^i and (l/iV)(-Frji -X0 + -F2 1^2) 
xi - (l/N)F^Xi. With the initial condition x{ 0) = 0, 



XI 



(n) 



(7-Q n )X!. 



(4.99) 



From 



we see that the convergence depends on the operator norm of the 



matrix Q (its largest eigenvalue). For example, if one chooses N = 32, s = 7, 
<? = 15 and k = 6, the largest eigenvalue of Q is 0.999999998 causing the algorithm 
to converge slowly. 
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Chapter at a Glance 

We now summarize the main concepts we have seen in this chapter. They all revolve 
around sampling, interpolation, and their combinations, between a larger space and a 
smaller space. In Section |4T2] this larger space is the space of vectors C M , while the smaller 
space is C^, with N < M; in Section |4.3| the larger space is the space of sequences £ 2 (Z) 
and the smaller is its subspace; and in Section 331 the larger space is the space of functions 
£ 2 (R) and the smaller is the space of sequences £ 2 (Z). By cascading interpolation followed 
by sampling, or sampling followed by interpolation, we will be able to move from one 
space to the next and back. It is the match between sampling and interpolation, that will 
determine the type of recovery possible in each case. These concepts were illustrated in a 
simple example in Figures 14.614.81 

Sampling and Interpolation with Orthonormal Vectors/Sequences/Functions 

(i) The sampling operator $* takes an input x from a larger space and maps it into an 
output y in a smaller space. We assume orthonormality, that is, 

$*$ = I. 

We call S call the orthogonal complement of the null space of $*, 

5 = AA($*) ± . 

Inputs x £ jV($*) are thus mapped to 0, while inputs 16S can be recovered. For 
inputs x (fi S, the component in JV*($*) is lost due to sampling, while the other 
component is preserved through $* x. 



Sampling y — <&*x 

input space input output output space 



C M 


X 


y 


C N c C M 


£ 2 {Z) 


X n 


y« 


c£ 2 (Z) 


L 2 {R) 


x(t) 


Vn 


c£ 2 (Z) 



(ii) The interpolation operator ^ takes an input y from a smaller space and maps it into 
an output it in a larger space. We call S the range of the interpolation operator $, 

S = K(®). 



Interpolation x = <3> y 

input space input output output space S — 7?.($) 



C N 


y 


X 


S c C M 


£ 2 (Z) 


Vn 


X n 


S c£ 2 {2 


£ 2 (Z) 


Vn 


x(t) 


ScC 2 ( 



(hi) Interpolation followed by sampling leads to perfect recovery because of the assump- 
tion of orthonormality. 



Interpolation followed by sampling y — "I>*$y 
input output reconstruction property 

y y perfect $*$ = / 
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(iv) Sampling followed by interpolation will recover the input perfectly only when x g S. 
Because of the choice of sampling and interpolation operators, P — $$* is an 
orthogonal projection operator. 



Sampling followed by interpolation x = $$* x 

input output reconstruction property 

x 6 S x perfect $$* = / 

x £ S x orthogonal projection 

Sampling and Interpolation with Nonorthogonal Vectors/Sequences/Functions 

(i) The sampling operator $* takes an input x from a larger space and maps it into an 
output y in a smaller space. S is the orthogonal complement of the null space of $* , 

S = AA($*) ± . 

Inputs x £ jV($*) are thus mapped to 0, while inputs x 6 S can be recovered. For 
inputs x (fi S, the component in Jv"($*) is lost due to sampling, while the other 
component is preserved through &*x. 
(ii) The interpolation operator $ takes an input y from a smaller space and maps it into 
an output x in a larger space, as in the orthonormal case. We call S the range of 
the interpolation operator <f>, 

s = n($). 

(iii) Interpolation followed by sampling will not always be identity; when it is, interpola- 
tion and sampling are called consistent, and the input is perfectly recovered, 



Interpolation followed by sampling y = <I>*<I>y 
input output reconstruction property 

y y perfect consistent 3>*$ = I 

y not perfect <E>* <E> 7^ J 

(iv) Sampling followed by interpolation will recover the input perfectly only when x £ S. 
When the interpolation operator is the pseudoinverse of the sampling one, 

$ = ^(J'i)- 1 , 

sampling and interpolation are ideally matched] | If, moreover, they are consistent, 

P — <f"I>* is an orthogonal projection operator. When they are not ideally matched, 
but consistent only, P is a projection operator. 

Sampling followed by interpolation x — (Jxl?* x 
input output reconstruction property 

x G S x perfect consistent &*& — I 

x ^ S x orthogonal projection ideally matched & $ = $($*$) _1 & 

consistent $*<& = / 

x projection consistent &*& — I 



Throughout, $ and <£> could exchange roles; <E>* could be used for sampling, <& for interpolation. 
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Historical Remarks 

The sampling theorem for bandlimited functions has an interesting history. Long before 
computers, various functions were tabulated for a set of values in the domain, thus raising 
the question of interpolation between these points. The use of sine interpolation for ban- 
dlimited functions was first shown by Edmund T. Whittaker (1873-1956), a British 
mathematician, in 1915. Harry Nyquist (1889-1976), a Swedish electrical engineer, 
was interested in signaling over bandlimited channels and formulated the celebrated cri- 
terion bearing his name in 1928: sampling at twice the maximum frequency uniquely 
specifies a bandlimited function. In the Russian literature, Vladimir A. Kotelnikov 
(1908-2005), an information theoretician working in the Soviet Union, proved the sam- 
pling theorem independently in 1933 (this is not the only result he and Shannon proved 
without having been aware of each other). John M. Whittaker (19051984), the son 
of the initial contributor, contributed further results to the interpolation theory on which 
his father worked. Meanwhile, Herbert P. Raabe (1909-2004), a German electrical 
engineer, wrote a dissertation in 1939 stating and proving sampling results for bandlimited 
functions. In 1949, Someya in Japan also proved the sampling theorem. 

In signal processing and communications, Claude E. Shannon 
(1916-2001), an American mathematician and engineer, is the one 
most often connected to the sampling theorem; it often bears his 
name. In 1948, in his landmark treatise A Mathematical Theory of 
Communication, Shannon formulated the sampling theorem as the 
first step in digital communications. He did not claim it as its own, 
however, stating "this is a fact which is common knowledge in the 
communication art" ; Shannon was aware of other formulations, by 
Whittaker, for example. Apart from the sampling theorem, Shan- 
«l " '-*ii ,.- ' I non is considered the father of information theory, was an accom- 

jfe ' " plished cryptographer, and made numerous contributions to game 

theory. Shannon's MS thesis established the design of digital cir- 
cuits using Boolean algebra, or the foundation of digital design. He 
was in contact and collaborated with great mathematicians and engineers of his time; he 
worked in some of the hallowed institutions of the time, the Institute for Advanced Study 
in Princeton, NJ, Bell Labs in Murray Hill, NJ, and MIT in Boston, MA. 




Further Reading 

Many books and textbooks cover sampling theory and its applications in signal processing 
and communications, for example, [102]. We also recommend review papers by Jerri |80] 
and Unser [155]; the latter one, in particular, develops sampling in shift-invariant sub- 
spaces. 

The basic paper on multichannel sampling as well as the source of Theorem 14.171 
is [111 ], Nonuniform sampling in its general form is more difficult. Suffice to say that 
on average, the sampling density has to be at the Nyquist rate, and evenly spread (for 
example, no accumulation points). 

For stationary bandlimited processes, the proof of the sampling theorem goes back 
to Lloyd [96]; contemporary versions can be found in [175[ 113 , 88]. 

For convergence behavior of the Papoulis-Gerchberg algorithm when applied to dis- 
crete signals, see [82] . 
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Exercises with Solutions 

4.1. Approximation by Piecewise- Constant Functions 

Consider the interval [0, 1) and the linear function x(t) = t. For any k £ N, let x k (t) be 
the least-squares approximation of x(t) among functions that are piecewise constant over 
intervals [n2~ k , (n + l)2 _fc ), < n < 2 . Compute \\x(t) — x k (t)\\ 2 and comment on its 
behavior as k — » oo. 

Solution: The set {!£fc ]n (t) = 2 fe ' 2 Xr n2 - fc (n+i)2- fc ) Wl f° rms an orthonormal basis for 
piecewise constant functions over [0, 1). We can then find x k (t) as 

2 fc_ 1 2 fc -l 

£fc(t) = 5Z (X, <p k ,n) tPkAt) = 2 ~ k ~ 1 J2 ( 2n + 1 )X[„2-^,(n+l)2-'=)(*)- 
n=0 n=0 

So, for example, for fe = 0, x (t) = 1/2 over [0, 1); for k = 1, »i(t) = 1/4 over [0, 1/2) and 
xi(t) = 3/4 over [1/2, 1); and so on. 

Because the basis functions do not overlap, we can compute the error for a single 
interval and then sum up individual errors: 

(x(t)-x k (t)) 2 X[n2 . kAn+l)2 . k) (t) = ^2- 3fe ; 

in other words, for a given k, the error between x(t) and x(t) does not depend on n. 
Because there are 2 k intervals, the total error is 

\\x(t)-x k (t)f 2 = 2 fc ^2- 3fe = l 2 - 2fe ; 

as k — > oo, the error will go to 0. 

4.2. Correcting for Inconsistent Sampling and Interpolation 

Given are the sampling operator <£>* and interpolation operator $ in C , such that 
D = <E>*<I> j£ I. We call C a correction operator when P = <J>C ,( I > * is a projection op- 
erator. Find C in terms of D for it to be a correction operator and note any restrictions 
on D. 

Solution: Our goal is to choose C so that P = $0$* is idempotent, 

p 2 = ($c5*)($c5*) = $cdc$*, 

idempotency is achieved when CDC = C, that is, when C = D~ x . This correction is 
possible if and only if D is invertible. 

4.3. Null Space of Sampling Operator 

Let the sampling <&* operator be defined as in (|4.35| . with x n € £ 2 (Z). 

(i) Find its null space. 

(ii) Show that both S, the range of the interpolation operator $ from (|4.28| ), as well as 
S, the orthogonal complement of the null space of the sampling operator <&* from 
( [4.35| ). are shift-invariant subspaces with respect to shift TV. 

Solution: 

(i) The sampling operator is given by 

(**a:)n = {x k , ip k -Nn)k- 
Thus, the null space of this operator is given by 

A/-(5*) = {x e £ 2 (Z) | (x k , £ k - Nn ) k = 0, n e Z}. 
(ii) From p8) , 

S = n($) = {xe f 2 (Z) I x n = {y k , f n -Nk)k, n e Z}. 
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Clearly, this space is invariant under shift N, since 

Xn-Nt = (Vk, fn-Nk-Nl/k 
still belongs to S. Similarly, because 

S = Af(S*) x = {yd £ 2 (Z) \y n = Yl a k9k-N n , n £ Z}, 

feez 

S is invariant under shift N since y n —Ni still belongs to S. 

A. A. Interpolation of Oversampled Signals 

Assume a function x(t) £ BL[— tt,tt]. If the sampling frequency is chosen at the Nyquist 
rate, uj s = 2n, the interpolation filter is the usual ideal filter, sine function g(t) from ( |4.57| ) 
with T = 1, with slow decay of the order 1/t. If x(i) is oversampled, then filters with 
faster decay can be used for interpolating x(t) from its samples. Such filters are obtained 
by convolving ideal filters in frequency as in Example 14.171 

(i) Let lu s = 3n. Give the expression for gi(t) = hi(t)h,2(t), where hi(t) are ideal filters 

with cut-off frequencies uii/2 and UI2/2, respectively. Find what wi and L02 must be, 

and verify that 32 (t) decays as 1/t 2 . 
(ii) Let lu s = An. Give the expression for gs(t) = hi(t)(h2(t)) 2 , with same cut-off 

frequencies as in |(i) | Find what lui and U12 must be, and verify that 93 (t) decays as 

1/t 3 . Show that G3((jj) has a continuous derivative, 
(iii) Generalize the construction in (ii)| Let lj s = (i + l)7r. Give the expression for 

gi(t) = hi(t)(h2(t)y 1 ~ 1 ' , with same cut-off frequencies as in |(i)| Find what lji and 

ui2 must be, and verify that gi(t) decays as 1/t*. Show that Gs(uj) has a continuous 

(i — 2)th derivative. 

Solution: Because of x(t) £ BL[— 7r, 7r], and from (14.54)1 , its spectrum sampled at ui s = kn 
will occupy the following set of frequencies: 

... [-(fc + l)7T,-(fc- 1)tt] U [—. 7T,7r] U [{k - 1)tt, (fc + 1)tt] ... (E4.4-1) 



(i) From (JE4.4-1) , we see that with oj s = 3ir, G2(w) must be of the form: 



Since G^fa) = (Hi * H2)(lo), we can use ( |E4.4-2| ) to find the cut-off frequencies u)\ 
and LO2 '• 

UJl LO2 ^1 LJ2 

1 = 27T, = 7T, 

2 2 2 2 

yielding oji = 3n and UJ2 = i", or, from |4.57) , 

g 2 (t) = sinc( — t)sinc(-t); 

this functions clearly decays as 1/t 2 . 
(ii) From flE4.4-l) , we see that with u)„ = An, G3(l*j) must be of the form: 

OsH = {l: \:\iji (E4.4-3) 

Since G3(oj) = (Hi * H2 *i?2)(oj), we can use flE4,4-3| ) to find the cut-off frequencies 
uji and UJ2 '■ 

UJl UJ2 Wl LU2 

— + 2— = 3tt, —-2— = n, 

2 2 2 2 

yielding uii = An and U12 = n, or, from (4.571) , 

( n \2 

g 3 (t) = sinc(2?ri) (^sinc(-t)J ; 

this functions clearly decays as 1/t 3 . 
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To show that G3(lo) has a continuous derivative, it is easier to consider it in 
the time domain first and then transform it back to the frequency domain. From 

-jtgsd) ^ ^M. 

aw 

The scaling constant j does not affect continuity so we ignore it, 



*ffs(t) = -*-sin(27rt) (sinc(^t)Y 



Thus, tgs(t) is a product of two identical sine functions and a sine function, which 
corresponds to a Fourier-domain convolution of a triangle window (Fourier-domain 
pair of the squared sine function) with a pair of impulses (Fourier-domain pair 
of the sine function). This is equivalent to adding two triangle windows (which 
are continuous functions); the derivative is thus continuous. Note that the second 
derivative is not continuous, since t 2 gs (t) is a product of a sine function and two sine 
functions, which translates into adding four rectangular windows in the frequency 
domain, and rectangular windows are not continuous. 



(iii) We now generalize what we have seen above. From ( [E4.4-1] ), we see that with 
ijJs = {i + l)i", Gj(oj) must be of the form: 



Since Gi(uj) = (Hi * H? * ... * i^X^), we use (JE4.4-4]) to find oj\ and u)i\ 

(i— 1) times 

OJl ,. W2 . Wl U] 2 

\- (l — 1) = 17T, It — 1) = 7T, 

2 ^ ; 2 2 v ; 2 

yielding uii = (i + l)7r and UJ2 = n, or, from ( |4.57) , 

S*(*) = sinc( V - t) (sinc(-t)J ; 

this functions clearly decays as 1/t'. 



From (3.61a) 



<-*>"-»«<'> " ^# 



Thus, 



i(l " 2) ^w = -nUJ sm (^^V ( sm( 2 i) ) ( smc( i 4) ) • 

Thus, v l ~ 2 > gi(t) is a product of two identical sine functions and (i— 2) sine functions, 
which corresponds to a Fourier-domain convolution of a triangle window (Fourier- 
domain pair of the squared sine function) with (i — 2) pairs of impulses (Fourier- 
domain pair of the sine function). This is equivalent to adding triangle windows 
(which are continuous functions); the (i — 2)th derivative is thus continuous. 

4.5. Dirichlet Kernel Proof 

Prove that ( [4,90[ ) is indeed a Fourier-series transform pair. 
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Solution: 








g(t) 


(a) 








(_£) 


Xif. V e J( 2 "/ T )("-' £ h)' 

n=0 






(d) 


Vjj c -.)(2x/T)t,, )i V- e J(27r/T)nt 
n = 






H 


/T r 1 oj(2ir/T)fe s t 
^ Ts c -.7(2 7 r/T)fe„)t 1-e^ 

T 1 - eJ'( 27r / T )' 






(/) 


V 1 * r -i(2ir/T)k h .)tj(-K/T)k s t r -i(n/T)t e 
T e -j(T/T)t 


_ e J(ir/T)k s t 
- eJ'(T/ T )' 




(9) 
How 


V^ sin (f fc s t) (/) 1 smc (#T*) 






T sin (ft) y/j): sine (ft) ' 




where (a) fo 


s from the definition of the Fourier series, (|3.90b|); (b 


) from the definition 



constant in front of the sum; (e) follows from the finite sum formula ([PI. 65-1] ); in (f) we 
pulled out terms from both numerator and denominator; (g) follows from ( |2.275[ ); and (f) 
from the expression for the sine function, ( [2.8a) . as well as k s = T/T s . 
4.6. Effect of Noise on Cascades of Sampling and Interpolation 

Let x be an M-dimensional random vector x = [xn xi ... xjvf—i] , whose mean is 
zero and autocorrelation matrix is A x = E[xx* ] = o 2 I. Let also the sampling operator <&* 
and interpolation operator $ be consistent. Find the mean and the autocorrelation matrix 
of the outputs of the cascades 

(i) interpolation followed by sampling, y = i'^x; and 
(ii) sampling followed by interpolation, x = <E><E>*x. 
Solution: 

(i) The mean of the output of interpolation followed by sampling is 

E[y] = e[$*$x] = E[x] = 0, 

because $* and <E> are consistent (<3?*<l > = / from ( [4.38) ). Similarly, the autocorrela- 
tion matrix is 

E[yy*] = E[xx*] = a 2 1 . 
(ii) The mean of the output of sampling followed by interpolation is 



E[x] = e[<E"$*x] = $<I>*E[x] = 0. 



The autocorrelation matrix is 

E[Xx*] = e[<I>**xx*$**1 = $i>*E[xx*]l>$* = <t 2 $<J*i<l>*. 

4.7. Papoulis-Gerchberg Algorithm 

Consider the inpainting problem discussed in Section 14.7.11 

(i) Pose the problem as solving a linear system. 

(ii) Calculate condition numbers for the two cases of pixels missing in a block as opposed 
to missing randomly. Consider a one-dimensional vector x of size N = 1024, with 
a bandlimited DFT that is zero for \K\ > 400. In x, set 128 samples to 0, in three 
different ways by zeroing out (a) the first 128 samples, (b) every 8th sample, and 
(c) random 128 samples. Compare the resulting condition numbers, and speed of 
convergence of the corresponding Papoulis-Gerchberg algorithms. 

Solution: TBD. 
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Exercises 

4.1. Least-Squares Approximation of Sampling Followed by Interpolation 
Let x(t) £ £ 2 (R). Show that 



/oo 
\x{t) - x(t)\ 2 dt, 
-00 



where S is the space of functions piecewise constant over integer intervals, is obtained for 

: I 1 






Use this to show that P = <I>$* with <E>* the sampling operator from (4.2) , and $ the 
interpolation operator from (14. 3| ). achieves this least-squares approximation. 

4.2. Sampling and Interpolation for Bandlimited Vectors 

Given is a vector x £ C M and its DFT X from (2.159a) . It is said to have bandwidth fco, 
fc oddH when its DFT satisfies 



for all 



M 

k 

2 



fco-1 

< — . 



Define the subspace of bandlimited vectors BL{— fco mod M, . . . , fco mod M} when x 
that belongs to that subspace has bandwidth fco- For x in such a bandlimited subspace, 
find <E> so that the system described by cj><I>* in Section [4.2.11 achieves perfect recovery x = x. 

4.3. Orthogonal Projection with No Sampling Prefilter 

Consider the system depicted in Figure 14.14( b) with no sampling prefilter. Under what 
condition on the interpolation postfilter g does the system implement an orthogonal pro- 
jection? Describe the subspace that is the range of that orthogonal projection. 

4.4. Sampling the DTFT of a Finite- Length Sequence 

Consider a real sequence x n with finite support [—(fco — l)/2, (fco — l)/2]. Show that its 
DTFT X(ei") can be sampled at fco points, oj^ = 2-7rfc/fco, or X& = X(e 1Uk ) so that x n 
can be recovered from X^ . 

4.5. Bandlimiting as Orthogonal Projection 

(i) Show that an ideal lowpass filter with cut-off frequency wo/2 computes the orthog- 
onal projection of its input onto BL[— uo/2, u>o/2]. 

(ii) Indicate the class of filters that compute orthogonal projections (not necessarily 
lowpass). 

4.6. Bandlimited space with rational sampling rate changes 

Given is a system in Figure [P4.6-H and a sequence x n £ BL[— 27r/3, 27r/3]. Find out what 
the filter g n has to be for x = x when: 

(i) M = 2, N = 3; 
(ii) general coprime M and N. 

4.7. Bases for Shift- Invariant Subspaces 

Let x n = S n + S n - 1 + 5 n _ 2 and let S be the shift-invariant subspace with respect to shift 
2 generated by x. Find an orthonormal basis for S. What up- and downsampling factor, 
sampling prefilter and interpolation postfilter should be used so that the system in Fig- 
ure [4T7fb) computes an orthogonal projection to S? 



3 fco is odd because for a real spectrum, X^ = X*_, , see Table \3A[ 
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(\M)— 9-n —Qn) - (\N)— 9n — (M 



Figure P4.6-1: Sampling followed by interpolation with rational sampling rate 
changes. 



4.8. Ideally Matched Sampling and Interpolation with Nonorthogonal Sequences 
Find the ideally matched 



(i) interpolation operator for the sampling operator <!?* in (|4.37a[ ); as well as 
(ii) sampling operator for the interpolation operator <E> in Example 14.121 

4.9. Western Movies and Sampling 

Classic movies were shot with 24 frames/s, each frame being a snapshot of the scene, with 
an exposure time of roughly 10 ms. Given a carriage with wagon wheels having 16 spokes 
of diameter 2 m, what is/are the speed(s) so that the wheels look motionless? 

4.10. Downsampling by N 



Prove the expression for downsampling by N, ( [2.184] ) . by going back to the underlying 
continuous-time function x c (t) and resampling it with an ./V-times longer sampling period. 
That is, consider x n and y n = £„jv as two sampled versions of the same continuous-time 
function x c (t), with sampling periods T and NT, respectively. 
[Hint: Use (|4T5l) and ( [430) .) 

4.11. Multirate System 

For the discrete system y n = (U3GD2 x)n, with G an LSI filter: 

(i) Give the ^-transform of the output signal. 

(ii) Assume the underlying continuous-time input function x c (t) £ BL[— n/T, n/T], and 
that it was sampled at 1/T Hz. Write Y(e^) as a function of X c (uj) and G(e JU ) 
in the frequency domain. Also, specify the increase/decrease in the sampling rate 
achieved by this system and the conditions on X c (lo) to avoid aliasing. 

4.12. Sine Orthonormal Basis for BL[-tt/T, n/T] 
Let T G R+, and for each integer n let 

<p n (t) = JLsinc(£ (t-nT)). 

(i) Show that the family {¥>n(t)}n£Z i s orthonormal, that is, 

(<Pn(t), <Pm{t)) = S n - m . 
(ii) Show that this family forms an orthonormal basis for 3L[—n/T,n/T]. 

4.13. Bandpass Sampling 

Refer to Example 14.161 and the spectrum X(lo) in (14.591) . 

(i) Show that the solution described in the example can be expressed as an orthonormal 
expansion for the bandpass subspace BL([— 37r, — 2n] U [2ir, 37r]) and give the basis 
functions, 
(ii) An alternative to a pure sampling solution is to demodulate (shift in frequency) the 
bandpass signal into the baseband, or BL[— 7r, 7r]. Give the demodulation function, 
the sampling, and the reconstruction (which requires modulation), 
(iii) Show that bandpass signals with frequency support [—(K + l)u;o/2, — Kujq/2] U 
[Kujq/2, (K + l)u;o/2] can be sampled with sampling frequency uj s = ljo, and per- 
fectly reconstructed. 
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4.14. Continuous- Time Modulation Using Discrete- Time Operators 

In Proposition 14.161 we saw how continuous-time convolution can be implemented in 
discrete time. Consider two bandlimited functions, x(t) £ BL[— cu x /2, co x /2] and g(t) £ 
BL[-a, s /2,uV2]. 

(i) Show how the multiplication y(t) = g(t)x(t) can be implemented in discrete time, 
(ii) Compute y(t) if x(t) is as in (I4.56J ) and g(t) is a cosine function as in ( |4.55a| ) with 

U>0 = 3lT. 

(iii) Is there a more efficient way to compute y(t) in ((ii))? 

(Hint: Use the digital-to-analog conversion and interpolation.) 

4.15. Multichannel Sampling 
Let x(t) £ BL[-7r,7r]. 

(i) Let N = 3 and show how 3 channels followed by sampling with period T = 3 perfectly 
reconstruct the input function if and only if the matrix in (|4.67| ) is nonsingular for 
lj £ [0,2tt/3]. 

(ii) Repeat |(i)| but where the channel inputs are x(t), x'(t), and x"(t). 

(iii) Generalize |(ii)| to an arbitrary number of derivatives, that is, prove that an iV-channel 
filter banks with channel inputs x(t), x'(t), . . ., x^ N_1 ' (t), each sampled with period 
T = N, perfectly reconstructs the input function if and only if the matrix in ( |4.67| ) 
is nonsingular for ui £ [0,27r/iV]. 

(iv) Consider irregular sampling and again a 3-channel system. Let go(t) = &{t), gi(t) = 
S(t — 1 — a) and 92(*) = S(t — 1 — /3), where a, (3 6 [—1, 1]. For which values of a, f3 
does det(G(w) = 0? Plot the condition number of G(w)) for a = 1, j3 £ [-1/2, 1/2) 
and lu £ [0, 27r/3]. Comment on the result. 

4.16. Adjoint Operators 

Prove that the sampling operator (14.78 [ I and the interpolation operator (4.80) are adjoints 

of each other. 

(Hint: Mimic the proof for functions, ( [4.45] ).) 

4.17. Dirac Delta Comb Fourier Series Pair 



Prove that (4.82a) is indeed a Fourier-series transform pair. 

4.18. Dirichlet Kernel 

In ( |4.90| ) we computed the Dirichlet kernel by assuming a bandlimited periodic function 
and a periodic filter g(t). Do the same by assuming a nonperiodic filter g(t) and then 
sampling the result in Fourier domain. 

4.19. Fourier Series with Triangle Spectrum 

Let x(t) £ BL{— kh, . . . , k^}, with k s = 2fc^ + 1, be a periodic function with period T = 1 
whose Fourier series coefficients are real and symmetric and given by 

Xfc = (l-f, W<k h ; 

I 0, otherwise. 

(i) Show that sampling x(t) at nT s = n/k s , k = — k) lt k^ + 1, . . . ,kf t — 1, k^, ensures 

perfect function recovery and indicate how to do it. 
(ii) Comment on what happens when sampling uniformly with fewer than k s samples. 

4.20. Continuous- Time Function Leading to i.i.d. Samples 

Let x(t) be continuous-time function whose projection onto BL[— n/T, n/T] is a continuous- 
time white Gaussian noise x(t). Prove that filtering x(t) with an ideal lowpass filter of 
bandwidth ujq = 2ir/T and sampling with period T leads to a sequence of samples y n that 
is discrete-time white Gaussian noise. 
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In previous chapters, we saw how to write a sequence or a function as an ex- 
pansion in a basis (Chapters ELH3}, or using sampling and interpolation (Chapter [4J. 
Often, however, we do not have the luxury of representing the sequence or function 
exactly, requiring the development of approximate representations: 

(i) If the expansion is too big, that is, it requires too many coefficients to represent 
the function or the sequence, we need truncation methods. For example, in a 
Fourier series representation, we can decide to keep a subset of the coefficients. 
How many and which ones to keep influence the choice of the approximation 
method, and the errors so incurred affect the approximation quality. 

(ii) If the function varies too fast, then sampling cannot catch the variations 
fast enough. We then typically first smooth (bandlimit) the function, which 
amounts to projecting onto a subspace, such as lowpass filtering we saw in Sec- 
tion 14.4.21 The projection error is a measure of how well we can approximate 
the function using a finite bandwidth approximation. 
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(iii) If the function or sequence is too complex, that is, the expansion coefficients 
require too much storage space or bandwidth for transmission, we need to both 
reduce their number as in (i) , as well as their accuracy, leading to compression. 



Approximation theory deals with the choice of expansion coefficients to keep; 
compression theory deals with approximating those coefficients. Some methods are 
very classical, such as approximation via Taylor series, while others are more recent, 
such as nonlinear approximation in bases. We review a collection of approaches with 
a clear message: there is no one-size-fits-all technique. Each case requires a careful 
analysis and a well-engineered solution. When this is properly done, a good solution 
can have high impact, as demonstrated by the billions of cameras, mobile phones 
and computers using JPEG to represent images efficiently. 

5.1 Introduction 

Given a function x(t), we often do not have the luxury of representing it exactly, 
but must instead be satisfied with an approximation x(t). This may be due to a 
variety of reasons. 

(i) The function may not be known everywhere, but only at a number of specific 
points. Given these points, we may try to interpolate the function as well as 
we can. 
(ii) The function may be known everywhere, and may thus have an expansion 
such as a Fourier series over an interval, but we may not be able to afford 
to compute the full expansion for complexity and storage reasons. Instead, 
we may be able to obtain just an approximate expansion or a projection on a 
subspace. Then, we may be interested in knowing how good the approximation 
is. 
(iii) Finally, expansion coefficients themselves, which are typically real numbers, 
might have to be approximated as well, given the finite precision and storage 
requirements. This is when we consider signal compression. 

Of these various approximations, we have already discussed the projection onto 
a subspace in the particular case of sampling and interpolation in the previous 
chapter. Our aim here is to broaden this approximation view to a wider range of 
methods, including compression. To make these points concrete, we go through a 
simple example. 

Function to Be Approximated Given are piecewise constant functions over the 
unit interval. Assume the number of constant pieces K to be known, while the 
transition points {tk)k=i and the values of the constants {&i} k ~ are unknown 
but in the interval [0, 1]. An example function for K = 3 is shown in Figure 5.11 
Such a function is called parametric since, given K , the set of 2K — 1 parameters 
{^J-fji 1 and {"Jfjcj 1 specify x(t). 

Let us consider a few possible approximate representations. 
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Figure 5.1: Piecewise constant function with K — 3, {ti, ti} = {1/4, 1/2}, 
{a , an, a 2 } = {1, 2, 4}. 
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(a) K = 



(b) if = 1 



(c) if = 2 



Figure 5.2: Best least-squares approximation XK(t) of x(t) from Figure [5TTI using a 
Legendre polynomial basis of degree K. 



Least-Squares Polynomial Approximation First, we try fitting x(t) with a poly- 
nomial of degree L. We should not expect too much, since the function is discon- 
tinuous, while polynomials are smooth. As an example, take Legendre polynomials 
seen in Solved Exercise 1.51 since they form an orthonormal basis for an interval. 
In Figure 5.2[ we show the best least-squares approximation xk (t) using Legendre 
polynomials of degree K = 0, 1, 2. 



Lagrange Interpolation: Matching Points Instead of using orthogonal polyno- 
mials such as Legendre polynomials, we can try polynomial interpolation where 
specific points of the function are exactly matched, as is the case, for example, with 
Lagrange interpolation. An advantage is that the function need not be known ev- 
erywhere, but only at the interpolation points, which must be distinct. Figure 15.31 
show the result for 1, 2, 3 known points from Figure [5TT1 The result is not entirely 
satisfactory, unsurprisingly so, since polynomials are necessarily smooth, unlike the 
function we are trying to match. 



Taylor Series Expansion: Matching Derivatives When an analytical expression 
of the function to be approximated is available and derivatives exist up to some 
order, we can match the function and its derivatives using the well-known Taylor 
series at a point of interest. The advantage is that the representation is exact at 
that point, while it gradually worsens when moving away. In some sense, one can 
think of Taylor series as function extrapolation, as opposed to Lagrange method, 
which is a function-interpolation method. 
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Po(*) 



-i 




fa) K - 1 



(b) K = 2 



(c) K = 3 



Figure 5.3: Lagrange interpolation of x(f) from Figure 5.1 using if known distinct 
points. 



Other Polynomial Approximation Methods A mixture of Lagrange interpolation 
and Taylor extrapolation is a hybrid method called Hermite interpolation, which 
matches both points and derivatives. While Legendre polynomials minimize the 
quadratic approximation error, Lagrange, Taylor and Hermite methods minimize 
a particular norm. Minimizing the maximum error leads to minimax polynomial 
approximation and Chebyshev polynomials. 

Approximation of Functions by Splines Polynomial approximation is inherently 
local. For a function on the real line, while approximating by polynomials on a 
set of successive intervals is a possibility, we will encounter problems at interval 
boundaries. Instead, methods preserving continuity and derivatives are preferable. 
Among such methods, splines are the most popular, partially due to their compu- 
tationally efficiency. We consider in particular uniform splines, since they generate 
a shift-invariant subspace and are closely related to regular sampling. 

Consider approximating our function in Figure 15.11 using constant splines, or 
elementary box function ( 13.761 ) . This seems like a good idea because both the 
constant spline and the function we want to approximate are piecewise constant. 
However, the constant spline will have a support of length 1/N when the interval 
[0, 1] is split into N uniform pieces. In Figure [5741 we show the approximation using 
3, 6 and 9 constant splines. The good news is that the resulting approximation is 
piecewise constant, the not so good news is that to be close to the original function, 
the approximation requires N to be large. Because of the particular choice of 
transition points in the function from Figure 5.11 choosing N = 4 would have given 
a perfect approximation. Since constant splines and their shifts form an orthogonal 
basis, this approximation is a best least-squares approximation as well. 

Note that splines, in particular uniform splines of which the constant func- 
tion is the lowest-order representative, have a number of nice properties that will 
be explored in more detail in this chapter. Functions living in spline spaces are 
another example (besides bandlimited functions and sampling) where discrete-time 
processing performs continuous-time processing in a precise way, and we will show 
this specifically for derivatives and integrals. 

Linear Approximation in Fourier Bases Since we are considering a function on 
an interval, it is natural to look at the Fourier series representation. We already 
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(a) iV = 3 (b) N = 6 (c) JV = 9 

Figure 5.4: Length-l/JV constant spline approximation of x(t) from Figure [5TT] 
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(c) 2K + 1 = 21 



Figure 5.5: Linear approximation of x(t) from Figure [5TT1 using the Fourier series with 
2i-C + 1 lowest-frequency terms. 



know that the Gibbs phenomenon will hurt at discontinuities, and lead to poor 
convergence. Figure [5751 shows the Fourier series approximation with 2K + 1, with 
K = 1, K = 5, and K = 10 lowest-frequency terms. Since the Fourier series is 
an orthonormal system, we obtain again a best least-squares approximation, but 
the trigonometric polynomials of the Fourier series are not a good match to the 
piecewise constant function and convergence is slow. 

The approximation we just saw is an example of linear approximation accom- 
plished via orthogonal projection. We decide a priori that x(t) is the orthogonal 
projection of x(t) onto a (fixed) subspace of the lowest 2K + 1 Fourier series basis 
vectors. 



Nonlinear Approximation An alternative is to choose the largest-magnitude coef- 
ficients in the orthonormal basis, which now depends on the particular x(t) we wish 
to approximate. This is an adaptive subspace approximation, because we choose 
the best subspace depending on the function to be approximated. In Figure 15.61 
we show this approximation using an orthonormal Haar wavelet basis, and retain- 
ing the largest coefficients in the expansion. As can be seen, the approximation 
is much better than that with Fourier series. This is due to both the basis (Haar 
wavelets, being piecewise constant, are better suited to the function we wish to 
approximate) as well as the approximation method (adapting to the function by 
retaining the largest coefficients) . We explore both linear and nonlinear approxima- 
tion in orthonormal bases in what follows. In the case of random processes, a classic 
linear approximation method is the Karhunen-Loeve transform (KLT), a principal 
component analysis based on the autocorrelation matrix of the process. 
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(a) K - 1 
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Figure 5.6: Nonlinear approximation of x(t) from Figure [5TT1 in the Haar wavelet basis 
with if largest terms. 




(a) (b) 

Figure 5.7: Quantization of the unit square with (a) square and (b) hexagonal cells. 



Compression Finally, let us turn to compression, where given a representation 
(could be an approximation using a polynomial or a basis), we need to approximate 
it, or quantize the parameters of the representation. The simplest form of this 
approximation is rounding to the nearest integer. Consider the unit square in K 2 , 
[0,1] 2 . The rounding-to-the-nearest-integer rule means that all vectors inside a 
given cell are represented by that cell's center, shown in Figure 5.7( a) with square 
cells and Figure [577T b) with hexagonal cells. 

When a representation is to be stored on a computer or transmitted over a 
channel, the number of bits used to describe the signal becomes the currency. Take 
the Fourier series representation, with 3 bits for each of the coefficients, then 3, 
33, and 63 bits are used for the various linear approximations in Figure 15.51 This 
results in an increased error; Figure 5.81 shows the result. In addition, the various 
binary numbers may have different probabilities of occurring, or various entropies, 
which potentially leads to more compression. 

What we just saw raises many questions which are at the heart of signal 
compression, such as (1) what the best representation for compression purposes is; 
(2) what good quantization strategies are; and (3) how one should allocate bits in 
a representation. The answers will depend on signal models and the related notion 
of entropy. A branch of information theory called rate-distortion theory gives some 
fundamental bounds in a somewhat abstract setting where complexity is not an 
issue. In the more practical realm of real signal compression, transform coding has 
become the standard answer as well as the workhorse of multimedia compression 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



5.2. Approximation of Functions on Finite Intervals by Polynomials 



461 



X3,q{t) 



Xu, q (t) 



X21,q(t) 




(&)2K + 1 = 3 



(b) IK + 1 = 11 



(c) IK + 1 = 21 



Figure 5.8: Linear approximation (Figure 5.51 dotted) of x(t) (Figure 5.11 dashed) 
using the quantized Fourier series (solid) with IK + 1 lowest-frequency terms and 3 
bits/coefficient. 



algorithms. This has lead to methods with high practical impact, such as audio 
compression as in MP3 or image compression as in JPEG. 

Chapter Outline 

The chapter follows this brief introduction: Section 15.2 discusses approximation 
of functions on finite intervals by polynomials, starting with the orthonormal basis 
given by Legendre polynomials, moving to matching a function at specific points us- 
ing Lagrange interpolation, and reviewing the classic Taylor expansion that matches 
a function and its derivatives at a given point. We discuss minimax approximation, 
in particular using Chebyshev polynomials. Section 5.31 looks into approximating 
functions on the real line by splines and the shift-invariant subspace they generate. 
We calculate explicit projections on spline spaces, as well as orthogonalizations. 
We also calculate continuous-time operators such as derivatives and integrals on 
the discrete sequence of spline coefficients. Section [5.41 considers approximations in 
bases, both with linear and nonlinear methods via truncation of coefficients. The 
linear approximation method just truncates the series, while the nonlinear one is 
an adaptive subspace selection matched to the function to be approximated; the 
coefficients are first reordered and then truncated. For stochastic processes, the 
Karhunen-Loeve transform (KLT) is derived as an optimal linear approximation 
method. In Section 15.51 we switch gears and approximate expansion coefficients, 
leading to compression and transform coding. We review entropy coding, quantiza- 
tion and bit allocation. Then, we discuss transform coding in detail, together with 
its main components. Section 5.61 closes with computational aspects. 

5.2 Approximation of Functions on Finite Intervals 
by Polynomials 

The previous chapter focused on sampling and interpolation operators and both 
exact as well as approximate representations of sequences defined on integers and 
functions defined on the real line using these operators. The infinite lengths en- 
countered there made it imperative to use LSI filtering in the sampling and interpo- 
lation operations. Otherwise, similarly to what we saw with nonuniform sampling, 
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the computations can become complicated. We now shift our attention to approx- 
imating a function on a finite interval using polynomials. Here, we will not have 
shift-invariance properties, and, in fact, the behavior near the endpoints of the 
interval is often quite different than the behavior near the center. 

Throughout this section, we denote by x(t) the function to be approximated 
on the finite interval [a, b] £ K, the approximating polynomial of degree at most K 
(see ( [2380]) ) by 

K 

p K (t) = ^a k t k , (5.1a) 

and the error between the function x(t) and its approximation pK(t) by 

e K (t) = x(t)-p K (t). (5.1b) 

Minimizing the appropriate norm of the error leads to different types of approxima- 
tions; bounds on such norms gives insight into the quality of the approximation. 

We start in Section 15.2.1 with least-squares polynomial approximation, in 
which series expansions with respect to orthogonal polynomials in general, and 
Legendre polynomials in particular, arise naturally. We then discuss matching ap- 
proximating polynomials to a given function at a specific number of points, La- 
grange interpolation in Section 5.2.21 or matching derivatives up to a certain order 
at a single point, Taylor series expansion in Section [5.2.31 Hermite interpolation in 
Section [5. 2. 41 combines the two. Minimizing C°° error is first studied in Section [5. 2. 51 
and then applied to FIR filter design. 

5.2.1 Least-Squares Approximation 

Let x{t) be a real- valued function in C ([a, b]). An approximation x that minimizes 

r b 

|2 _ / l r „(4.\ ~/j.\\2 



\\x-x\\i = / (x(t)-x(t)Ydt 

J a 

over some set of possibilities for x, is called a least-squares approximation. Least- 
squares approximation of such a function in the subspace Vk[o-, b] spanned by px (t), 
polynomials of degree at most AT, is straightforward using the tools developed in 
Chapter [1} Consider the C 2 norm of the error between x and pk\ S9 \ 

fb 
|2 _. ||_ „ 1 1 2 / /„/j.\ „ /jA\2 



\e K \\i = Wx-pkU = / (x(t) - p K (t)y dt 



Because of the projection theorem, Theorem 1 1.261 the solution to this approxi- 
mation problem is the orthogonal projection of x onto Vk [a, b] given an orthonormal 
basis for Vk[o-, b\. Calling {^0(^)1 ( / 3 i(^)i ■ ■ • 1 fK(t)} such a basis, the least-squares 
approximation is (see Theorem 11.401 ) 

K 
PK{t) = ^2(x, <Pk)<Pk(t)- 
fc=0 



3 We explicitly denote here the C norm, but will drop it when there is no risk of confusion. 
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4 



Figure 5.9: Least-squares approximation of x(t) = shi7rf/2 on [0, 1]. 



This is a polynomial of degree at most K because each of the tfkS is a polynomial 
of degree at most K. 

Example 5.1 (Least-squares approximation) Let us find the least- 
squares degree-1 polynomial approximation of x(t) = sin7rf/2 on [0, 1]. For this, 
we need an orthonormal basis for Pi[0, 1]; one such basis is {1, V3(2t — l)}r°l 
The least-squares approximation is 

Pl (t) = (siny, l)-l + (y, \/3(2t - l))V3(2i - 1) 

IT TT Z TT Z TT Z 

This approximation illustrated in Figure 15.91 

The method above uses an orthonormal basis for Vk[o>> b]. Such an orthonor- 
mal basis is not unique. However, if one seeks a single sequence of vectors {(fo, ifi, ■ ■ ■} 
such that, for every K € N, 

{tpo, <Pi> ■ ■ ■ , ¥>k} is an orthonormal basis for Vk[o>i b], 

the solution is unique up to multiplications by ±1 and is obtained by using the 
Gram-Schmidt procedure on {1, t, t 2 , . . .}. This construction creates an orthonor- 
mal basis for Vk[o-, b] that is the union of an orthonormal basis for Vk-i[&, b] and 
ifK- With this nested sequence of bases, the least-squares approximations satisfy a 
recursion: 

Pk = PK-i + (x, <Pk)<Pk- (5.2) 

Legendre Polynomials A known class of polynomials that can be used for least- 
squares approximation is the class of Legendre polynomials, 

Lk{t) = w^ (1 " t2)fc ' fceN ' (5 - 3) 

orthogonal on [—1,1]. The first few are shown in Figure [5. 101 and tabulated below: 

L (t) = 1 L 3 (t) = i(5t 3 -3t) 

Li(t) = t L 4 {t) = i(35i 4 -30i 2 + 3) 

L 2 (t) = |(3i 2 -l) L 5 (t) = |(63i 5 - 70t 3 + 15t) 
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Figure 5.10: The first six Legendre polynomials (up to degree 5), {Lfe}|_ . 



Legendre polynomials are orthogonal but not orthonormal; an orthonormal set can 
be obtained by dividing by the norms ||£jfe|| = \j2j(2k + 1). The first few in nor- 
malized form are: 



L Q (t) 
In.it) 
L 2 (t) 



V2 



1) 



L 3 (t) 
U(t) 

U{t) 



3 



^(35£ 4 -30i 2 + 3) 
|^(63t 5 - 70i 3 + 15i) 



Once we have the Legendre polynomials, changing the interval of interest does 
not require a tedious application of the Gram-Schmidt procedure as we have done 
in Solved Exercise 1 1.51 Instead, polynomials orthogonal with respect to the C 2 inner 
product on [a, b] can be found by shifting and scaling the Legendre polynomials; see 
Exercise 15.11 

Example 5.2 (Least-squares approximation with Legendre polynomials) 
Let x(t) = tsm5t. To form least-squares polynomial approximations on [0,1], 
we need orthogonal polynomials in £ 2 ([0, 1]); we can obtain these by shifting and 
scaling the Legendre polynomials. The first few, in normalized form, are: 



<Po(t) 



\/3(2t - 1) 



tp 2 (t) = v^(6t 2 -6t+l) 

<p 3 (t) = V7(20t 3 -30t 2 + 12t- 1) 



The best constant approximation is po = {x, fo)^Po, and higher-degree least- 
squares approximations can be found through ( |5.2[ ). The least-squares approxi- 
mations up to degree 3 are: 



Po(t) 
Pi(t) 



-0.0951 
0.489- 1.17* 



P2(t) 
Pa(*) 



-0.109 + 2.42i - 3.59i 2 

-0.204 + 3.56i - 6.43i 2 + 1.90* 3 



The resulting approximation errors are shown in Figure 15.111 



This particular basis comes from applying the Gram— Schmidt procedure to (1, t). 
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a. 




Figure 5.11: Errors of least-squares polynomial approximations of x(t) = isin5t on 
[0, 1], for degrees K = 0, 1, 2, 3. Curves are labeled by the polynomial degree. 



Orthogonal Polynomials While Legendre polynomials and their shifted versions 
solve our least-squares approximation problems, changing the inner product to 



(*i y) 



x(t)y(t)W(t)dt, 



where W(t) is a nonnegative weight function, would change orthogonality relation- 
ships among polynomials. One of the ramifications is that applying the Gram- 
Schmidt procedure to the ordered monomials {1, t, t 2 , . . .} would yield a different 
set of polynomials. Also, if the weight function has adequate decay, inner products 
between polynomials on [o, oo) or (—00,00) can be finite, so we need not consider 
only finite intervals. In Chapter at a Glance, we give definitions, weight functions 
and intervals of orthogonality for different families of polynomials. 

Orthogonal polynomials have many applications in approximation theory and 
numerical analysis; Exercises 15.21 and 5.31 explore some of their properties. We will 
consider Chebyshev polynomials in Section 5.2.5 because they play an important 
role in C°° approximation. 

5.2.2 Lagrange Interpolation: Matching Points 

In many situations in which we desire a polynomial approximation of a function 
on an interval, we cannot use inner products of the function with a set of basis 
functions because we may not know the function on the entire interval, or we may 
not want to compute the required integrals]^ 3 ] However, if we know the values of the 
function at certain points in the interval, we can possibly use polynomial matching 
of the function at these points. We now look at when such matching exists, when 
it is unique, and how to bound the approximation error. 

From ( |5.1aj) , a polynomial of degree K has K + 1 parameters and is thus 
determined by K + 1 independent and noncontradictory constraints. Let us look 
carefully at whether specifying px at K + 1 points provides suitable constraints. 



91 The difficulty of computing integrals may be the reason for forming a polynomial approxi- 
mation, leading to numerical integration techniques such as the trapezoidal rule and Simpson's 
rule. 
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Fix a set of K-\-l distinct points {tk}u—o C [a, b] called nodes, and assume that 
the values of the function x at the nodes, yk = x(tk), k = 0, 1, . . . , K , are known. 
Requiring px to match x at the nodes gives a system of K + 1 linear equations with 
K + 1 unknowns, 



to 

h 



2 


t K 





2 
1 


■ ■ l l 



1 t K t 



K 





a 




~2/o~ 




a\ 


= 


2/1 




_a K _ 




VK_ 



(5.4) 



The matrix in ( 15.4) is a square Vandermonde matrix, and is invertible if and only 
if the nodes are distinct; see ( 11.231) . Thus, the values of x(t) at K + 1 distinct 
nodes uniquely specify an interpolation polynomial of degree at most K . As we 
know from Section [l.BL having more samples (equations) will not lead to a solution: 
the range of the Vandermonde matrix will be a proper subspace and the vector of 
samples will (in general) not be in that subspace. On the other hand, having fewer 
samples always leaves the polynomial unspecified, as there will be infinitely many 
solutions to ( 15.41 ). 

To see this, suppose that pk-i is the degree-(A" — 1) polynomial uniquely 
specified by {(tk,Uk)}k=o ■ For any c£l, the degree- A' polynomial 

P K (t) = PK-l(t) + c(t - t )(t - h) ■■ ■ (t - tK-l) 

will match Pk-i at all K — 1 nodes because the second term is zero at each node. 
Given the value at one additional node, yx = x(tn), one can choose c so that px 
matches x all K nodes. This recursive process is one way to arrive at the general 
formula that follows. 



Lagrange Interpolation Formula The unique polynomial of degree K that matches 
x at K + 1 distinct nodes {ifc}^L ^ s gi ven by 



A' 



K 



PK (t) = 5>(i fc ) n 



fc=0 



i = 
i ^ k 



t-u 



(5.5) 



The polynomial PK{t) is called the Lagrange interpolating polynomial for {(£&, %{tk))}k = o- 
It interpolates correctly because 



■pr t£ — ti 

\\t k -u 

1 = 

i ^ fc 



1, for I = k; 
0, otherwise, 



and the interpolating polynomial is unique. 

Example 5.3 (Lagrange interpolation) Let us construct approximations 
using ( 15.51 ) for two functions on [0, 1], one continuous and the other not, 



x\(t) = tsYnbt and £2 (t) 



t, 0<t< 1/V2; 
t-1, l/y/2<t<l. 



(5.6) 
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Figure 5.12: Polynomial interpolation of functions in ( 15.61) for degrees K — 1, 2, . . . , 7 
and points taken uniformly on [0, 1]. Bold curves are the original functions and light curves 
are labeled by the polynomial degree. 



Let the nodes be k/K for k = 0, 1, ...,K (although evenly-spaced nodes are 
not a requirement). Figure [5.121 shows the functions (bold) and the interpolating 
polynomials for K = 1, 2, . . . , 7. The continuous function x\ is approximated 
much more closely than the discontinuous function X2- 

This example suggests that, when polynomial interpolation is used, the smoothness 
of a function affects the quality of approximation. Indeed, for functions that are 
sufficiently smooth over the range of interest (encompassing the nodes but also 
wherever the approximation is to be evaluated) , the pointwise error can be bounded 
precisely using the following theorem: 

Theorem 5.1 (Error of Lagrange interpolation) Let x have K + 1 con- 
tinuous derivatives on [a,b] for some K > 0, and let {ife}^L C [a,b] be distinct. 
Then, with px defined in ( 15.5) and t € [a, b], 



tK 



CA-(t) 



rifc=o(^ - tk) (k+v 



(0 



(K + iy. 

for some £ between the minimum and maximum of {t, to, t\, 



tK 



|e * (t)l * ( g +i)i ^ 



r («+i) 



(0 



(5.7a) 
i/c}. Thus, 

(5.7b) 



for every f G [a, b]. 



One immediate consequence of the theorem is that if x is a polynomial of degree K , 
the interpolant p/<- matches everywhere. This follows from (5.7b) because x^ K+l > 
is identically zero. Of course, this was already obvious from the uniqueness of 
the interpolating polynomial. In general, the max£gr 0) (,i \x^ X '{£,)\ factor in the 
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(a) K = 5 




(b) K = 11 



Figure 5.13: Bases of degree- if Lagrange interpolating polynomials for K + 1 nodes 
evenly spaced over [0, 1]: tk = k/K, k — 0, 1, . . . , K. 



error bound is a global smoothness measure; it affects the pointwise error bound 
identically over the entire interval. 

One interesting aspect of the error bound ( I5.7b[ ) is that it depends on t through 
, _ 1 1- — tfc| factor. The error is zero at any node because of the interpolating 
property, but the bound behaves differently in the neighborhoods of different nodes. 
Moving away from a node by a small amount s, that is, setting t = tk + e, we have 



the nf=o I* 



K 



n i* — **=i 



in** 



fe=0 



i, i^k 



If the nodes are evenly spaced, the error bound becomes worse more quickly around 
an extremal node than around a central node (see also Figure |5.12| ) . The poten- 
tial for worse behavior at the endpoints can also be seen if we view the Lagrange 
interpolation formula as a series expansion using K + 1 polynomials 



r-k.K 



(0 



K 

n 

i = 
i ^ k 



t-tj 



fc = 0, 1, 



K, 



(5.8) 



as a basis. 



Each basis function depends on all of the nodes; two examples with 

These illustrate that at node tk, 



evenly-spaced nodes are shown in Figure 15.13 
basis function £k,K is 1 and the other basis functions are 0. Also, £k,K(t) is not 
necessarily small far away from tk- This means that the sample x(tk) affects the 
interpolation far from tk- In particular, Figure [5.13( b) illustrates that unless K is 
small, the basis functions become large near the ends of the approximation interval. 
This is problematic for controlling the pointwise error. We will see later that a 
better spacing of nodes can improve the situation substantially. 



5.2.3 Taylor Series Expansion: Matching Derivatives 

In the previous section, we used the Vandermonde system ( 15.4) to establish that 
matching K+l sample values will uniquely specify a degree- AT least-squares polyno- 
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Figure 5.14: Taylor series expansions of x\(t) in (15. 6ft for degrees K — 1, 2, . . . , 5. Bold 
curve is the original function and light curves are labeled by the polynomial degree. 



mial approximation to a real- valued function in £ 2 ([a, b\). Other ways of specifying 
K + 1 constraints exist, one of the most familiar being the Taylor series expansion. 

Taylor Series Expansion Assuming x has K derivatives at to, let 

K 



PKit) 



E^o)^ (fe) (^ 



fc=0 



/,■! 



(5.9) 



We have seen this expression in (PI. 65-5] ); it is called a Taylor series expansion 
around to- It has the property that px and x have equal derivatives of order 
0, 1, . . . , K at to- (The zeroth derivative is the function itself.) 

Example 5.4 (Taylor series expansion) Consider Taylor series expansions 
of the functions X\ and X2 in ( 15.6) around 1/2. Both x\ and x^ are infinitely 
differentiable at 1/2, so their Taylor series are easy to compute. Figure 15.141 
shows x\ (bold) and its expansions of degree K = 1, 2, . . . , 5. The Taylor series 
for X2 is not interesting to plot because we have px{t) = t for all degrees > 1. 
While this is exact for t e [0, l/v2], the error for t £ (l/\/2, 1] cannot be reduced 
by increasing the degree. 

A Taylor series expansion has the peculiar property of getting all its informa- 
tion around x from an infinitesimal interval around to- As Example 15.41 illustrated, 
this means the approximation can differ from the original function by an arbitrary 
amount if the function is discontinuous. For functions with sufficiently-many con- 
tinuous derivatives, a precise error bound is given by the following theorem: 



Theorem 5.2 (Error of Taylor series expansion) Let x have K + 1 con- 
tinuous derivatives on [a, b] for some K > and let to G [a, 6]. Then, with pk 
defined in ( 15. 9j ). 

(t-t ) K+1 



£K(i) 



AK+l] 



(K + iy. 



(0 



(5.10a) 
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(a) Lagrange 



(b) Taylor 



Figure 5.15: Errors of polynomial interpolation of x\(t) — isin5i from Examples 5.3 
and 15.41 Curves are labeled by the polynomial degree. 



for some £ between t and to- Thus, 


x^ +1 )(0 


(5.10b) 


C A ' - its- , TM J 00 ?*, 

(K + 1)! fe[o,6] 
for every t S [a, 6]. 



The error bounds ( ]5.7b[ ) and ( j5.10bj ) are very similar. Both are proportional to 



(iC + 1)! fe[a,fe] 



(A'+l) 



(0 



the global smoothness measure. They differ in the dependence on t, but in a way 
that is consistent: If every tk in (5.7b) is replaced by to, then the two error bounds 
are identical. Lagrange interpolation depends on the nodes being distinct so we 
cannot literally make all the tfcS equal. However, having K + 1 nodes distinct but 
closely clustered is similar to having derivatives up to order K at a single node. 
This is explored further in Exercise 5.51 

Moreover, these bounds require greater smoothness as the polynomial degree 
is increased. Furthermore, there exist infinitely-differentiable functions for which 
these bounds do not even decrease as K is increased; see Exercises 15.41 and 15.61 



5.2.4 Hermite Interpolation: Matching Points and Derivatives 

A natural combination of the ideas of Lagrange interpolation and Taylor series is 
to determine a polynomial by fixing some number of derivatives at each of several 
nodes. This is called Hermite interpolation. 

Example 5.5 (Hermite interpolation) Suppose that we are given the val- 
ues of a function x and its derivative at to and t\\ 



IK) 



a; (t ) 



Z() 



a/ (to) 



Hi 



x(ti), 



Zl 



x'{tx). 
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We want to check that fixing these four values uniquely determines a cubic poly- 
nomial. We write the cubic polynomial and its derivative, 



Pz{t) = a Q 
Psit) = "1 
and find their values at to and t\\ 



Ot\t + Ot^t + ot^t 
2a 2 t + 3<33£ 2 , 



1 to 
1 U 



I 2 
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I 2 
2t 
2*i 



f 3 l 

HI 




a 




Vo 


t 6 




"i 




2/i 


3t 2 




a 2 




za 


3*2 J 




_a 3 _ 




. Z K 



(5.11) 



The matrix in this equation is invertible for any distinct pair (toi*i) because its 
determinant is — (t\ — to) 4 7^ 0. Thus, the polynomial is uniquely determined. 

More generally, suppose that derivatives of order 0, 1, . . . , dk are specified at 
each distinct node tk for k = 0, 1, . . . , L. Since the constraints are independent, 
a polynomial of degree K = (^2 k=0 (dk + 1)) — 1 can be uniquely determined. A 
proof of this and an error bound generalizing ( ]5.7b[ ) and ( ]5.10b| ) are outlined in 
Exercise 15.71 

5.2.5 Minimax Polynomial Approximation 

The techniques we have studied thus far determine a polynomial approximation 
through linear operations. Least-squares approximations — minimizing an C 2 error 
metric — come through the linear operation of orthogonal projection to a subspace, 
and the interpolation methods use some combination of samples of a function and 
its derivatives to fix a polynomial approximation through the solution of a system 
of linear equations. We now turn to the minimization of pointwise, or C°° , error. 
Since £°°([a,6]) is not a Hilbert space, we do not have geometric notions such 
as the orthogonality of the error to the subspace of polynomials Vk[&, b] to guide 
the determination of the optimal approximation. We will see that the optimal 
polynomial approximation is generally more difficult to compute, but interpolation 
with specially chosen nodes is nearly optimal. 

Let x(t) be a real- valued function in L 2 ([a,b\). An approximation x that 
minimizes 

||» — S||oo = max \x(t) - x(t)\ 

te[a,b] 

over some set of possibilities for x is called a minimax approximation because it min- 
imizes the maximum error. The following theorem shows that the set of polynomials 
is rich enough to enable arbitrarily-small C°° error for any continuous function. A 
constructive proof is given in Exercise 15.81 



Theorem 5.3 
on [a, b] and le' 


(Weierstrass approximation theorem) Let x be 
e > 0. Then, there exists a polynomial p for which 


continuous 








|e(*)| 


= l*W- 


-p(t)\ < e 


for every t £ 


[a,b]. 


(5.12) 
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Denote by px(t) the minimax approximation among polynomials of degree at most 
K, and let £ Pi .r-(£) be the error of that approximation with C°° norm ||e Pl ,K-||oo- The 
theorem states that the mere continuity of x is enough to ensure that ||e P] .K-||oo can 
be made arbitrarily small by choosing K large enough. This contrasts starkly with 
the C°° error bounds that we have seen thus far, ( ]5.7b[ ) for Lagrange interpolation 
and (5.10b) for Taylor series expansion, which require greater smoothness as the 
polynomial degree is increased. 

Sometimes bounds can be unduly pessimistic even when performance is reason- 
able. With Lagrange interpolation, the difficulty from an C°° perspective is that the 
maximum error between nodes can differ greatly, so the C°° error can be large even 
when the function and its approximation are close over most of the approximation 
interval. When the nodes are evenly spaced, the maximum error tends to be largest 
near the ends of the approximation interval; we observed this in Figure 5.15( a). 
connected also to the large peaks shown in Figure 15.131 The large oscillations near 
the endpoints can be so dramatic that the C°° error diverges as K increases, even 
for a C°° function; see Exercise [531 Thus, we must move beyond interpolation with 
evenly-spaced nodes for minimax approximation. Moreover, with an approximating 
polynomial qx (t) , it never pays to have fewer than K + 1 points at which the error is 
zero. Both bounding the minimax error and computing a minimax approximation 
depend on understanding what happens between these points. 

Let {tk}k=o *- I a ' ^] be K + 1 distinct and ordered points selected to partition 
[a, b] into K + 2 subintervals: 

K+i ( [a, t ), for k = 0; 

[a,b] = [j h where I k = I [t h -i,t k ), for k = 1, 2, . . . ,K\ (5.13) 

fc=o { [t K ,b], ioik = K+l. 

(While it is natural to think of e q ,K(tk) = x(tk) — qic(tk) = for k = 0, 1, . . . , K, 
this is not necessary for the developments that follow.) Since the subintervals cover 
[a, 6], the C°° error is the maximum of the errors on the subintervals: 

lkg,/f||oo = max max|e,,K-(t)|. 

H k=o,i,...,K+i te/ fc 

The following theorem shows that for a minimax approximation, the error e q ^K 
should oscillate in sign from one subinterval Ik to the next and the maximum 
absolute value of the difference should be the same on every subinterval. 

Theorem 5.4 (de la Vallee-Poussin alternation theorem [6]) Let x be 
continuous on [a, b] and let [a, b] be partitioned as in ( 15.13) for some K G N. 
Suppose there exists a polynomial qx of degree at most K and numbers Sk € Ik 
for k = 0, 1, . . . , K + 1, such that e q ^K{sk) alternates in sign. Then 

fe= o mhl K l £ 9' K ( Sfc )l - ll e P.^II°° ^ ll e 9,^ll°°, (5-14) 

where e Pi x is the minimax approximation error among polynomials of degree at 
most K. 
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The first of the two inequalities in (5.14) is the interesting one; the second simply 
states that the minimax approximation px is at least as good as qk ■ 

To use the theorem, we must find an approximating polynomial qx that creates 
enough alternations in the sign of the error. Then, the strongest statement is 
obtained by choosing Sk such that each |eg ; K(sfe)| is maximum on the subinterval 
Ik ■ The main result of the theorem is that there is no way to change the polynomial 
to push the error uniformly below the smallest of the local maxima of |e gj x(i)|. 
Intuitively, pushing the worst (largest) of the local maxima down inevitably has 
the effect of pushing the best (smallest) of the local maxima up. The next theorem 
makes the stronger statement that equality of the local maxima happens if and only 
if the minimax approximation has been found. 



Theorem 5.5 (Chebyshev equioscillation theorem [6]) Let x be continu- 
ous on [a, 6], and let K G N. Denote the minimax approximation error among 
polynomials of degree at most K by c Pt K- The minimax approximation px is 
unique and determined by the following property: There are at least K + 2 points 

a < s < si < ■ ■ ■ < sk+1 < b 

for which 

x(sk)-p K (sk) = o-(-l) fc ||e p ,if||oo, k = 0,l, ...,K+1, 

where a = ±1, independent of k. 

We illustrate both theorems by continuing our previous examples. 

Example 5.6 (Minimax approximation) Consider again approximating x(t) 
= £sin5£ on [0, 1]. To draw conclusions from Theorem 15.41 we must find a quit) 
such that the error e q .K{t) changes sign at least K+l times. This is true for least- 
squares approximations computed in Example 1 5 .21 shown in Figure d. Ill For each 
degree K G {0, 1, 2, 3}, the error e Qt K(t) does indeed change sign at least K + l 
times, allowing us to choose K + 2 values s^ G Ik to apply Theorem 15.41 (The 
error of the degree-2 approximation has one more sign change than necessary.) 
By choosing SfcS to give maxima of |e 9j x(i)| in each interval, we get from (5.14) : 

0.459 < ||e p ,o||oo 0.0717 < ||e p ,i|U 

0.331 < llep.illoo 0.0956 < ||e p , 2 ||oo 

The first four minimax polynomial approximations of x(t) = t sin 5t and their 
C°° errors are: 



2.13t d 



Po(t) P 


rf -0.297 




Pi(t) * 


a 0.402- l.OOOi 




P2(t) P 


i 0.0929 + 1.4U- 


- 2.62t 2 


Ps(t) P 


i -0.124 + 3. 22t- 


- 6.31i 2 



e p ,o||oo ? 


a 0.661 


£p,l||oo r 


a 0.402 


€p,2||oo ? 


a 0.159 


e p,3||oo ? 


s 0.124 
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Figure 5.16: Errors of minimax polynomial approximations pK(t) of x(t) — tsin5t on 
[0, 1], for degrees K = 0, 1, 2, 3. Curves are labeled by the polynomial degree. Dashed 
lines mark the maxima and minima, highlighting that for each degree, the minimum is the 
negative of the maximum. 



These approximation errors e Pt K(t) are shown in Figure [5.161 The dotted lines 
highlight that the approximation error lies between ±||e Pi .R-||oo and reaches these 
boundaries at least K + 2 times, satisfying the condition of Theorem 15.51 The 
intuition behind the theorem is that to not reach the ±||e P] x||oo bounds K + 2 
times wastes some of the margin for error. 

Computation of minimax approximations is generally very difficult because 
the values of the extrema of the error can depend on the polynomial coefficients in 
a complicated way. One common iterative algorithm is presented in Section 15.61 



Chebyshev Polynomials An alternative to aiming for exact minimax approxima- 
tions is to use an approximation that is simpler to compute but only nearly minimax. 
We now return to our observation regarding the weakness of the Lagrange inter- 
polation with evenly-spaced nodes from an C°° perspective; the error is large near 
the ends of the interval of approximation. One way to counter this is to have more 
nodes near the ends of the interval at the expense of having fewer near the center. 
We can estimate the improvement using the bound ( ]5.7b[ ). In fact, it seems sensible 
to minimize the factor rifc=o l^ — M from ( ]5.7b[ ) . even though minimizing the bound 
is not the same as minimizing the C°° error. When the interval of approximation 
is [—1, 1], the resulting nodes are zeros of the Chebyshev polynomial of degree K. 
This result, discussed in more detail below, is one of many instances where we will 
encounter Chebyshev polynomials. 
They are defined as 



Xfc(i) = cos(A;arccos£), k € N, 

and are orthogonal on [—1, 1] with the weight function W(t) 

(x,y) = J x(t)y(t)(l-t 2 )- 1/2 dt 



(5.15) 
(1-i 2 )- 1 / 2 , that is, 

(5.16) 
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Figure 5.17: Chebyshev polynomials up to degree 5, {Tk}k=o- 



The proof that these are indeed orthogonal polynomials is left for Solved Exer- 
cise 15.11 Chebyshev polynomials satisfy the recursion 



T k+1 {t) = 2tT k (t)-T k _ 1 (t), 



(5.17) 



with To(t) = 1 and T\(t) = t. The first few Chebyshev polynomials, plotted in 
Figure 15.171 are: 

T 3 (t) - 4t 3 -3t 
T 4 (t) = 8t 4 -8t 2 + l 
T 5 (t) = 16t 5 -20t 3 + 5t 



(5.18) 



To(t) = 


i 


Ti(t) = 


/ 


T 2 (t) = 


2t 2 - 1 


roots of Tk(t) are 






/2m + l 

COS 7T 



2k 

and the relative maxima and minima are 

fm \ 

t m = cos l-il , 



0, 1, 



(5.19) 



Proofs of these expressions are left as Exercise 15.11 

From (5.15) . it is clear that |7fc(£)| < 1 for every k and t, see Figure [5.171 
While the Legendre polynomials also satisfy this bound, they do not cover the in- 
terval [—1,1] evenly; compare Figure [5.171 to Figure [5.101 Intuitively, the uniform 
magnitude of the Chebyshev polynomials makes the truncation error in approxi- 
mating a continuous function with a linear combination of Chebyshev polynomials 
also have approximately uniform magnitude. 



Near Minimax Approximation A related way to use Chebyshev polynomials is 
to interpolate with the roots (5.181 ) as nodes when approximating a function on 
[—1,1]. Among all polynomials with degree K + 1 and leading coefficient 1, the 
scaled Chebyshev polynomial 2~ a 'Ta'+i(£) has minimum C°° norm equal to 2~ K . 
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Figure 5.18: Errors of near minimax polynomial approximations of x(t) — isin5t on 
[0, 1], for degrees K — 0, 1, 2, 3. Curves are labeled by the polynomial degree. Approxi- 
mations are obtained by interpolation with Chebyshev nodes. 



Therefore, the factor 



K 



te[ 



max TT \t-t k \ 



k=0 



in the maximum of the error bound ( |5.7b[ ), is minimized by choosing {^fc}^L ^° ^" e 
the K + 1 zeros of Tj~(t). The bound then becomes 



k,,*(*)l < 



(K + 1)\2 K ee[-i, i] 



AK+1) 



(0 



(5.20) 



for approximation of x(t) on [—1, 1] with a polynomial gi<- of degree at most K. The 
error can be bounded relative to the minimax error as 



\e q , K \\°o < [-ln(Jf+l) + 2J ||e Pli f| 



(5.21) 



so it is near minimax in a precise sense. 

Example 5.7 (Near minimax approximation with Chebyshev polynomials) 
Return again to the approximation of x(t) = tsin5t on [0, 1]. If the interval of 
interest were [—1,1], we would obtain near minimax approximations satisfying 
( 15.20) and ( 15.21) by interpolating with the roots of Tr-(£) as the nodes. The 
only necessary modification is to map the roots from [—1,1] to [0,1] with an 
affine transformation. The errors of the resulting approximations are plotted for 
K e {0, 1, 2, 3} in Figure [5381 

Table [5TH summarizes the C°° error performances of various approximations 
from this and previous examples. The first four are significantly easier to compute 
than the minimax approximation. Least-squares approximation is a projection to 
the subspace of polynomials; it is optimal for C 2 error by definition, and its £°° 
error is not necessarily small. A Taylor series expansion is accurate near the point 
at which the function is measured (assuming the function is smooth), but quite 
poor farther away. Interpolation using uniform nodes is improved upon by the 
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Approximation method Polynomial degree 

12 3 

Least-squares approximation 0.868 0.489 0.318 0.223 

Lagrange interpolation 0.963 0.785 0.346 0.197 

Taylor series expansion around 1/2 1.260 1.000 1.380 1.270 

Minimax approximation 0.661 0.402 0.149 0.124 

Near minimax approximation 1.260 0.637 0.298 0.144 

Table 5.1: Summary of C°° errors of approximations of x(t) — tsin5t on [0, 1]. 

use of Chebyshev nodes; this becomes increasingly important as the polynomial 
degree is increased. Note also that ( 15. 21) is satisfied. 



Filter Design A common use of minimax approximation is in the design of FIR 
filters with linear phase. The design problem can be posed as one of finding coef- 
ficients of a polynomial so as to match a certain desired response, and the key is 
which criterion to optimize. Assume a filter with a symmetric impulse response of 
length L = 2K + 1 , centered at the origin 92 ! 

hn = { J-»' ior\n\<K; 
[ 0, otherwise. 

Its frequency response is 

K K 

H(e^) = Y, h n e- ju,n = h + 2 ^ h n cos(nu>), (5.23) 



--K n=l 



where (a) follows from the desired linear phase (symmetry) of the filter and ( |2.275| ). 
We see that H(e JtAj ) is a real and symmetric function of u>. The goal now is to 
find the coefficients {h n }^ =0 so as to approximate a desired frequency response 
H^ d >{e^). For the sake of this discussion, we assume that the desired response 
corresponds also to a symmetric impulse response of real coefficients, or h n = h_ n 
and h { n ] g R. 

A first, obvious criterion is to use least-squares approximation, that is, mini- 
mize the quadratic error 

mm\\H {d) {e iu )-H{e iu )\\\ = min [ \H {d \e^) - H(e luJ )\ 2 dcu. (5.24) 

{h n } " {'i n }j-7r 

By Parseval's equality ( |2.103| ), this is equivalent to minimizing the time-domain 
error 



oo K 



min ||fcW - hn\\l = min V (hg> - K) 2 = min V (h^ - h n ) 2 . (5.25) 

x n/ x n/ n=-oo x n/ n =-K 



92 This will produce a noncausal filter, but one having a real frequency response. Once designed, 
one can make the filter causal by a right shift of K. 
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Since the cost is a sum of positive terms, this minimum is attained for 

h„ = h<$, for n = -K, -K+l, ..., K. (5.26) 

In other words, the best least-squares approximation is simply the truncation of the 
desired filter's impulse response to its central 2K + 1 terms. While the quadratic 
error is minimized, the maximum error can remain quite large. In particular, if 
the desired filter's frequency response is discontinuous, as, for example, in an ideal 
lowpass filter (2.107a) , then the Gibbs phenomenon leads to oscillations that do not 
diminish in amplitude, whatever the length (see Figure [278) . 

Another approach is to use minimax approximation, that is, minimize the 
maximum error between fl~w(e-"") and H{e^ u ) : 

Itclloo = \H^(e^)-H(en\oo = max \H d (e ju ) - H (e*")|. (5.27) 

uj£(— 7T,7r) 

The goal is to minimize £00 over the possible choices of {h n }^ =0 . Such a minimax 
solution will satisfy the Chebyshev equioscillation theorem, Theorem 15.51 but for 
this, we need to turn our filter design problem into a polynomial approximation. 
To do so, note first that 

T„(cosu;) = cosnuj. (5.28) 



This allows us to turn ( J5.23) into a polynomial of cos(oj). For example, if h 
[deb [a] 6 c d\, then ( 15.23) becomes 



H(e JUJ ) = a + 26 cos uj + 2c cos 2uj + 2d cos 3uj 

= a + 26 cos uj + 2c(2 cos uj — 1) + 2d(4cos uj — 3cosw) 
= (a — 2c) + (26 — Qd) cos uj + Ac cos ui + 8d cos uj, 



where in (a) we used the expressions for the first few Chebyshev polynomials. The 
resulting filter's frequency response is a third-degree polynomial in cosa->. In general, 

K 

H{en = J2 c k coskuJ = p (t)\t=co S u, (5.29) 

fe=0 

and therefore, the filter design problem becomes a minimax polynomial approxi- 
mation problem. In particular, it indicates a necessary condition for an optimal 
solution, namely that there will be K + 2 points with maximum error. This feature, 
called the equiripple property, is used in the algorithmic solution to the optimization 
problem, known as the Parks-McClellan algorithm. 

Example 5.8 (Minimax approximation of an ideal lowpass filter) We 
consider the design of an ideal lowpass filter, with a cut off frequency at ujq/2 = 
it/ 2; its impulse response is given in (2.107b) . Since an ideal filter cannot be 
attained with an FIR approximation, we will compare the best least-squares ap- 
proximation with a minimax design. For the latter, we will allow a transition 
band between passband and stopband. Assume an FIR filter of length 15. The 
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j- - "t -f -i- "* 



Figure 5.19: Specification for the approximation of a lowpass filter using minimax 
optimization. The transition band is of width 2wt, and 5 P , 5 3 are the error margins for 
passband and stopband, respectively. 




FIGURE TBD 



(b) 



FIGURE TBD 



(c) 



(d) 



Figure 5.20: Least-squares and minimax approximations to a halfband lowpass filter. 
(a) Impulse responses, (b) Difference, (c) Magnitude response, (d) Difference. 



best least-squares approximation is the truncation of the sequence in ( ]2.107bj ) to 
n = —7, —6, . . . , 7. Note that this is not the best least-squares approximation 
onto S = BL[-7r/2,7r/2] as in Theorem 1481 

For a minimax approximation, introduce a transition band between loq/2 — 
ait and too/2 + u t . This allows to smoothly move from the passband, where the 
response is close to 1, to the stopband, where it is close to 0, see Figure [5.191 
Further specifying the approximation, we require the passband to be within 1±5 P 
and the stopband to be bounded by 5 S , where 5 P and 5 S are imposed by the 
application domain of the filter (as is the width of the transition band 2L>t). 

Of course, if Lit, 6 p and S s are too stringent, a solutions might not exist. 
Choose, for example, 



Lit 



■n 
10 



— , 5„ 



0.1, 



and 



0.05. 



(5.30) 
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The resulting filter (designed in Matlab using the Remez function) and the trun- 
cated ideal filter are shown in Figure [5.20( a), while in (b) the difference between 
them is magnified. The goal is to see how well the lowpass filter is approximated 
in frequency domain, shown in Figure 15.201 The different styles of approximation 
are clearly visible. While the minimax solution distributes the error evenly and 
transits smoothly from passband to stopband, the least-squares approximation 
hugs the specifications as close as possible until there is a large error, just at the 
transition. 

To give some intuition why minimax designs are preferred over least-squares designs 
in signal processing applications, think of filtering of sine waves. In the minimax 
case, the variation of the filtering effect on two different frequencies is minimized, 
be it in the passband or the stopband. In the least-squares approximation, while 
most frequencies are treated close to the ideal case, the ones close to the transition 
suffer a large error. This, being at the boundary of the passband and stopband, can 
lead to noticeable errors, whereas the equitable distribution of errors in minimax is 
most likely unnoticeable. 



5.3 Approximation of Functions by Splines 

In the last section, we saw polynomial approximations that matched a function to 
be approximated either at specific points, or matched to its derivatives. This is 
typically done globally over an interval, sometimes leading to poor approximations 
(for example, at the boundaries). Two solutions can address this problem, namely 
minimax approximation, which we also saw in the last section, or local polyno- 
mial approximation using splines, which we introduce now. The idea of polynomial 
approximation using splines is to construct shift-invariant subspaces that span poly- 
nomials locally. Then, the smooth part of a function is well approximated by its 
projection onto the shift-invariant subspace. The prototype function used to gen- 
erate the shift-invariant subspace must satisfy a simple condition to approximate 
polynomials, given by the Strang-Fix theorem, Theorem 15.71 The simplest case is 
given by B-splines, a generalization of the box function with many useful properties. 



B-Splines Consider the elementary box function and its Fourier-transform pair, 
( 13.76) , with to = 1. Call this the elementary B-spline of order 0, or a constant 
spline, 

l, 1*1 < 1/2; FT r(0) , , _ . /W 



<M M = (o: SZ£, " *»(«)-*«(!). (5.31a) 

Define the TVth order B-spline by repeated convolution of f3^ ' (t) with itself, 

j8 <*0(t)=/9^- 1 )(t)*/3(0)(t) l ^ #)(,) = (sineg))^ 1 . (5.31b) 
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x(t) 








\ ' t 


-2 


V / 4 















'(*) 



~t 



(a) (b) 

Linear combination of linear splines (3^'{t) showing that the result is 



Figure 5.21 

continuous and differentiable almost everywhere, (a) x(t) with x n 
10 . . .]. (b) x'(t), which is well defined except at integers. 



002 



Shift-Invariant Subspaces Define the following shift-invariant subspaces: 

1; 



5iV 



span({/3W(i-fc)} feeZ ), N = 2k ■ 

span ({/?<*>(*- A; -l/2)} fcGZ ), N = 2k. 



(5.32) 



Functions belonging to these spaces, besides being invariant under integer shifts, 
have the very interesting property of having N — 1 continuous derivatives every- 
where, and N continuous derivatives almost everywhere (except at integers). In 
other words, functions in Sn belong to C N ~ X (and are almost C N ); the proof is left 
for Exercise 15.111 



5.3.1 Approximation in Shift-Invariant Subspaces Using Splines 

We now show how to use splines to approximate functions in shift-invariant sub- 
spaces. We start with a simple example. 

Example 5.9 (Linear spline basis) Consider the first-order spline 
function P {1) (t) = /3 (0) (i)*/? (0) (i). This is the hat function we have seen in (3.49a) 
and Figure [331(a) ; its Fourier transform was given in Q3.49fj ) and Figure [331(b), 



{1 \t) 



1-1*1 
0, 



1*1 <i; 

otherwise. 



FT 



BW( W ) = (-no (|)) 2 . (5. 



33) 



The linear spline is continuous, and differentiable almost everywhere (except at 
— 1, 0, and 1). Take any linear combination of integer-shifted linear spline, 



x(t) = 5> fc /3 (1) (i-fc). 
fcez 



(5.34) 



Figure [5.211 illustrates such a linear combination, where it can be seen that: (1) 
x(n) = x n ; (2) x(t) is continuous; and (3) x(t) is differentiable, except at integers. 
Given an arbitrary function x(t), how can we compute its best approxima- 
tion x(t) in S*i? Since the linear spline is not orthogonal to its integer shifts, 
we need to compute the dual basis, which is shift invariant too. That is, we 
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are looking for a function ^'{t) satisfying the classic biorthogonality relations 
( O02l . 

0W(t),pW(t-n)) t = S n , (5.35) 

as well as 

/3«(£) = Y^ckli {1) {t-k). (5.36) 

feez 

Substituting ( 15.36) into ( 15.35) , we need to find a sequence c^ such that 



(a) V^„, „(1) 

fcez feez 



J2c k (pU(t-k),pU(t-n)) t ( = ] ^c fc «il = S n , (5.37) 



where in (a) we denote by a n the deterministic autocorrelation function of 
P^'(t) evaluated at integers. This autocorrelation sequences is 

2/3, k = n; 



4-fc = { 1/6. fc = n±l; = |<5„_ fc + i(5„_ fc+ i + |<5„_ fe _i, (5.38) 



0, otherwise, 

because the linear splines overlap only when shifted by at most 1. With a^ 1 ' = 
[. . . 1/6 2/3 1/6 . . .] , we can rewrite ( 15.37) as a convolution 

C{z)A {1 \z) = 1. (5.39) 



6£^ 

(l-a2;- 1 )(l-(-l/a)z- 1 ) 

^3 ^3 









J2 Cka n-k = 5 n ' 


ZT 








kez 




We 


invert 


A( 


\z) to obtain C(z) as 






C(z) 


= 


1 6 






AM(z) ' " A + z + z- 1 








= 


A B 






1-az- 1 l-(-l/a)z 


-1 



1-az- 1 1- (-1/a)^- 1 ' 

with a = V3 — 2 and where we used the partial fraction expansion method as 
explained in Section 12.51 For a stable sequence, we have two possibilities for 
the ROC: \z\ < a, leading to a right-sided sequence that cannot be an autocor- 
relation, and a < \z\ < 1/a, leading to a symmetric two-sided autocorrelation 
sequence. We read off the individual terms from Table 12.61 to yield 

c„ = v /3(v/3-2)-"u_„_ 1 + \/3(\/3-2) n u„ = \/3(V3 - 2)l n L (5.40) 

This was quite an involved a procedure; instead, a faster solution is the 
following. The impulse response c n will be a symmetric geometric sequence, 



[... a 3 a 2 a |T| 



Because of ( 15.39) , its convolution with a„ has to equal 0, except at the origin, 

12 1 9 

6 3 6 
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/3 (1) W 





(a) Linear spline. 



(b) Its dual. 



Figure 5.22: (a) The linear spline /3 (1) (t) (hat function) ( 1533) and (b) its dual /3 (1) (i) 
using flgaD) and (I5T361) . 

with roots y/3 ± 2. By the same arguments as above, we choose ROC as 2 — y/3 < 
\z\ < 2+ vo, corresponding to a stable, symmetric, two-sided geometric sequence, 
as in ( |5.40[) . Using the coefficients we just found, we can compute f3^(t) as in 
( 15.36) , shown in Figure [5.22( b), together with the linear spline in Figure [5.22( a). 

What we saw for the linear spline is generally true for a B-spline of any order; 
Exercise 15.121 shows it for the quadratic spline. Denote by a n the deterministic 
autocorrelation function of f3^ N '{t) evaluated at integers, 



«W = (pW(t),pW(t-n)) t 



(5.41a) 



This sequence is nonzero for \n\ < N, and we know it is symmetric and its DTFT 
is positive. According to ( ]2.143b[ ), its z-transform A^ N >{z) can be factored into 

AW(z) = R^ N Hz)R ( - N \z- 1 ), (5.41b) 

where R( N '(z) has all zeros strictly inside the unit circle. Therefore, there exists an 
inverse C^ N '(z) = \/A^ N \z) such that its inverse z-transform, c„ , is a two-sided 
stable sequence. From this follows a biorthogonal dual function 



satisfying, by construction, 



E4 W) 

fee 



cWpWtt-k), 



(/?<">(*),/?<">(*-«)), 



(5.41c) 



(5.41d) 



This allows us to approximate an arbitrary function x(t) by its orthogonal projection 
onto the space of B-splines of order N, 



x(t) = ^{x({)jW(t-%/3W(i-fc). 

fee". 



(5.41e) 



5.3.2 Approximation in Shift-Invariant Subspaces Using 
Orthogonalized Splines 

Synthesizing functions based on B-splines is very convenient, since the basis func- 
tions f3^ N '(t — n) are of finite support. However, the dual functions (3^ N '(t) are 
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of infinite support, albeit with fast, exponential decay. An alternative is to derive 
orthonormal bases for the spaces Sn- This can be done via an orthogonalization 
procedure of the spline basis {@( N '(t — n)} n6 z- We use linear splines to illustrate 
the method. 

Example 5.10 (Orthogonalized linear spline basis) We want to derive 
a shift-invariant basis {ip^(t — n)} n <EZ for Si, or 

span ({<p (1) (t - n)} nez ) = span ({/3 (1) (t - n)}„ eZ ) , (5.42a) 

such that it is orthonormal, 

(¥> (1) (*),¥> (1) (*-")>* = 6 n . 

Because {^'(t — n)} n gz form a basis for Si, we can express our desired basis 
functions as a linear combination of P^ 1 ' (t — n), 

<p<U(t) = J2 d kP (l) (t-k). (5.42b) 

Using the orthonormality of basis functions, 

(<pW(t), <pW(t-n)) t = ^^d fc d,(/3«(i-fc),/3( 1 )(t-n-f)) t 



' ' ' "n+t-k 



= ^ ^ dkdt a) 1 ' 

= Y, Y, d k d k+k' a^fe 
kezk'ei. 

(o) \^ (!) ( d ) (1) 

k'e 



Y^a^_ k , ^ a*aW, (5.43) 



where (a) follows from (5.38) and the symmetry of a' '; (b) from the change of 
variable k! = k — I and dkdk-k' = dkdk+k 1 ', in (c) we named the autocorrelation 
dfc' = X^fe' dkdk+k 1 ', and (d) from the fact it is a convolution. In z-transform 
domain, orthonormality becomes 

A{z)A {1 \z) = 1, (5.44) 

where A(z) is the ^-transform of the deterministic autocorrelation of dk- 

Of course, we know that the solution to ( 15.44) is going to be the same as 
the solution to (5.39) . We now proceed to find it using this method. We need to 
take the spectral root of A{z) via A{z) = D(z)D(z~ 1 ). A possible sequence that 
has (5.40) as an autocorrelation is 

*. = \f^- 2 r, forn>0; 
(J, otherwise, 
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Figure 5.23: Orthonormal basis function cp^ft), which, together with its integer shifts, 
spans Si, the same space spanned by the linear spline and its integer shifts. 

(see also Exercise 12.41) . The resulting orthonormal basis function, 

■P-Uu = \ ,/, ,' (l:, i/ - i,\ (5.46) 



<pW(t) = J2 d ^ (1) (t-k), 



k=0 



is shown in Figure 15.231 



As for the biorthogonal case of the previous subsection, the orthogonalization 
method generalizes to any B-spline In (5.41b| ), take the spectral root with all zeros 
inside the unit circle. Its inverse is stable, and leads to a right-sided sequence d n 
from which one derives 



,(JV), 



k=0 



(5.47) 



This function and its integer shifts span Sn- Any function x(t) can be approximated 
by 

x(t) = ^(x(t),<pW(t-k)} t tpW(t-k). (5.48) 

fcez 

One remarkable property we saw both for (3^°'(t) and ^'(t) is their inter- 
polation property, that is, (3^°'(n) = 6 n and j3^(n) = S n . However, B-splines of 
higher order do not have this property; for this, one needs to use cardinal splines 
instead. Exercise 15.131 shows a general orthogonalization procedure. 

5.3.3 Continuous-Time Processing Using Discrete-Time 
Operators in Spline Spaces 

From our discussion so far, it is clear that there is a very tight bond between the 
sequence x n and the function x(t) generated using a B-spline and its shifts, 



:(t) = Y, x kP (N) (t-k). 



(5.49) 



A-e: 



An interesting manifestation of that bond is that one can perform continuous-time 
processing on x(t) by performing discrete-time processing on x n . The idea is intu- 
itive, since, if a signal is in Sn, its derivative will be in Sn—i, and conversely, its 
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integral will be in Sjv+i- Throughout, we will be manipulating Dirac delta func- 
tions; as before, we proceed formally without worrying about technical issues as 
these were discussed in Chapter |3J 

Computing Derivatives To compute a derivative of x(t), we need to compute a 
derivative of the splines used in the expansion (see ( J5.49J) ). We do that by exploiting 
the fact that they are formed as successive convolutions of box functions (5.31b) 
and using the derivative formula for convolution (3.66c) . 

We start with the causal version of the constant spline 

where u{i) is the Heaviside function (3.8) . The derivative of the constant spline is 

Ml 

A(t) = j/ 0) (t) = j t (u(t)-u(t-l)) ( = } 5(t)-6(t-l), (5.50b) 

where (a) follows from ( 13.9) . Then, the derivative of the ./Vth-order spline is 

* (/ gW( t )) ® 1(^-1)^)^(0)^) = ^-D (t)tA (t), (5.51) 

at at 

where (a) follows from ( 15. 31b) . 

We now rewrite x(t) starting from ( 15.49) as 

X{t) = J> fc /?<">(*-*) ( =' £% ^ 5{T-k)fi N \t-T)dT 

/oo , , /-oo 

^X k &{T-k)(i {N \t-T)dT ^ / X{ T ) j3 {N \t - T) dT 

-°°fcez J-°° 

x{t)*/3 (N \t) (5.52) 



(6) 

/ .. 

ke 
(d) 



where (a) follows from the sifting property of the Dirac delta function in Table 3.1[ 
in (b) we exchanged the order of integration and summation; in (c) we named 
^( T ) = Sfeez x k$(T — k); and (d) follows from from the convolution formula (3.36) . 
To find the derivative of x(t), write 

lx(t) ( = } l(x(t),f3^(t)) ® x{t)*j/ N \t) 



(c) „ 



^)*A(0*/3 (Ar - 1) (^), (5.53) 



This derivative is in the sense of distributions. 
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where (a) follows from ( 15. 52) ; (b) from ( 13.66cj ); and (c) from ( 15.51I ). Now 
x(t)*A(t) = J2x k 5(t-k)*(8(t)-5(t-l)) 



i,'t- 



0) 



(■0 



£ x k 5(t -k)-J2 x kS{t -k-l) 



ke- 



A-e: 



/X^k - Xk-i)S(t- k), 

kei 



(5.54) 



where (a) uses ( ]5.50bj ) ; (b) uses the convolution property of the Dirac delta function 
from Table 13.11 and (c) gathers terms using a change of variable. 

Starting from ( 15.53) , compute the derivative of a function x(t) in Sn as 

d 



dl 



x(t) = x(t) * A(t) * P^-^it) 



(a) 



(6) 



Y,(xk - x k ^)5(t - k) * p^-^it) 



A-fcS 



"£^5(1 -k)*^- 1 ^) ^ ^xiP^Ht-k), (5.55) 

A;£Z A;£Z 

where (a) follows from ( 15.54) ; in (b) we denoted by x' the first-order difference of 
the sequence x, the discrete derivative; and (c) follows from the sifting property of 
the Dirac delta function from Table 13.11 

In other words, to compute a continuous-time derivative of a function x(t) G 
Sn, we can use its discrete-time derivative sequence x' n and convolve it with splines; 
the resulting derivative x'(t) belongs to Sm—i, 



x(t) = £ Xk p( N )( t -k), 



A-e: 



±z{t) = £4/3 (Af+1) (*-fc). 



dl 



(5.56a) 
(5.56b) 



A t : 



Example 5.11 (Differentiation in a spline basis) We compute derivatives 
in 52, spanned by {P^ 2 '(t— k)} k £Z, where P^ 2 ' (t) is in its causal version, supported 
on [0, 3] (we computed it as convolution of the constant and linear splines), 



P (2 \t) 



i 2 /2, for < t < 1; 
-t 2 + 3t - 3/2, for 1 < t < 2; 
t 2 /2 - 3£ + 9/2, for 2 < t < 3; 



(5.57) 



0, otherwise. 
Its derivative is continuous and piecewise linear, 



!" ,2, «> 



t, for < t < 1 

-2£ + 3, for 1 < t < 2 

t - 3, for 2 < t < 3 

0, otherwise. 
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B™(t) 




1 

-1 


ip {2) (t) 




/ \ ' * 


1 x 


v 2 


Si 


4 







(a) Quadratic spline. (b) Its derivative. 

Figure 5.24: Computing continuous-time derivatives using discrete-time ones. 



The quadratic spline and its derivative are shown in Figure 15.24( a) and (b), 
respectively. The above is equal to 



^)(t)-^)(t-l) = A(t)*pV(t), 



as shown in Figure 15.241 

To compute a derivative of a function in Sn, we turn back to Example 15.91 
and Figure [5.211 The signal x(t) is generated from the sequence x, 



ai = L 2-1-100 
c' = L. |T| 1 -3 10 



According to ( ]5.55[ ), it is easy to see that we obtain x'(t) as in Figure 15.21( b) 
(note that there is a left-shift by 1, since in Example 5.91 the hat function is 
centered at the origin rather than being causal). 



Computing Integrals We have just seen how to compute continuous-time deriva- 
tives using discrete-time ones, with splines of lower order. It is tempting to do 
the reverse, which is indeed possible with some care. Let us look at ( 15.51) . The 
primitive of the right side is equal to (3^ N '(t), up to a constant, or 



/?W(t) 



A(T)*/3( 7V - 1 )(r)dr. 



(5.58) 
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We now compute the integral of a function x(t) in Sn, 

f x(r)dr { = } f J2 x kP (N) (r-k)dT ( => Ysf x k p {N \r-k)dr 

( => eT f E **- E x k )p {N) (T-k) d T 

fegZ °° \n=-oo n=-co / 

E/ (Xfc-^-i)/3 (JV) (r-fc)dr 

E^T (^»(r-i)-^'(r-fc + lf 



/ : 
(/) 



E*fc/3 (7V+1) (*-fc), (5.59) 

feez 

where (a) follows from the assumption that x(t) £ Sat; in (b) we exchanged order 
of summation and integration; in (c) we expressed x k as the difference of two sums; 
in (d) we named those sums discrete integrals X n = Ylt=—<x> x t °f the sequence x; 
(e) follows from change of variable k' = k — I; and (f) from ( 15.58) . 

In other words, to compute a continuous-time integral of a function x{t) G Sn, 
we can use its discrete-time integral sequence X k as weighting coefficients for the 
splines; the resulting integral belongs to Sn+i, 



x (t) = Y^XkP^Xt-k), (5.60a) 

kez 

I x(r)dT = J2 X kP (N+1) (t-k)- (5.60b) 



kei, 



Example 5.12 (Integration in a spline basis) Consider a function x(t) in 
Sq- According to ( 15.59) , 



f x(T)dT = J2 X kf3 {1) (t-k). 

•J —OO i.^m 



For simplicity, assume that x(t) = for t < and t > L and J^ n =o x n = 0> which 
also means J_ x(t) dt = 0. The example shown in Figure 15.21( b) satisfies this 
requirement, where the sequence and its primitive are (the derivative and the 
sequence in the figure) 



L. \\\ 1 -3 10 
[... 2-1-100 
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As was seen in Example 15.91 

x(t) = ^2x k f3 {0) (t- k) and (5.61a) 

fcez 

j x(r)dT = ^2x k p w (t-k) (5.61b) 

■*-°° feez 



are related by integration, verifying ( 15.591 ). 

Because of their simple form as piecewise polynomials, B-splines have a num- 
ber of remarkable properties, as we have just seen. For example, another interesting 
result involving splines is that inner products between an arbitrary function x(t) 
and a spline P^ N '(t) can be computed by integrating x{t) N + 1 times, evaluating 
this integral at integers, and then taking N + 1 discrete differences of the resulting 
sequence; see Exercise 15.141 B-splines are the simplest members of the spline fam- 
ily. Many generalizations are possible (for example, nonuniform, exponential); for 
pointers. 

5.3.4 Polynomial Reproduction and Strang-Fix Theorem 

The span of B-splines and their integer shifts was introduced in ( 15.321 ) as spaces 
Sn- We now show that polynomials of degree N belong to these spaces, that is, 

PN (t) = J2<x k P W (t-k). (5.62) 

fcez 

This is a particular case of a more general result, the Strang-Fix theorem, to be 
discussed shortly. We start with an example. 

Example 5.13 (Polynomial reproduction with the quadratic spline) 
We consider the quadratic B-spline as in ( 15.57) , except centered. 

(i) We first show that {(3^ 2 '{t — fc)}&gz reproduce constants, that is, 

Y,P {2) (t-n) = 1. (5.63a) 

nGZ 

The sum is a periodic function of period 1, and only 3 copies of (5^ 2 > overlap 
at any point. It is enough to prove that /3 (2) (£+ 1) + (3 {2) {t) +/3 (2) (i- 1) = 1 
on the interval [—1/2, 1/2]. Apart from the central term, the left tail is 
right shifted by 1, and the right tail is left shifted by 1. Putting the three 
components together, 

i { t-\f + \-t 2 + \(\-tf = i-t + t 2 + \-e = 1. 

(ii) We now show {/3^ 2 '(t — fc)}fcez reproduce linear terms, that is, 



^nj3 {2) {t-n) = t. (5.63b) 
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Consider the interval [n — 1/2, n + 1/2]. The overlapping left, central and 
right portions of (3^ 2 ' are 

\{n+\-t) 2 , f-(i-n) 2 , and \(t - n + \f , 

and they are weighted by n — 1, n and n + 1, respectively, 

\{n - l)(n + \ - tf + n(| - (t - n) 2 ) + ±(n + l)(t - n + ±) 2 = t. 

(iii) Finally, we show {(3^ 2 '{t — k)}k^z reproduce quadratic terms, that is, 

^ n 2 /? (2) (i - n) = t 2 + constant. (5.63c) 

We can check that 

l( n _l)2 (n+ l_ t) 2 +n 2 ( 3_ (t _ rl) 2 )+ l (n+1) 2 (t+ l_ n) 2 = £ + ^ 

Then, any polynomial of second degree can be written as a linear combination, 
a 2 t 2 + a 1 t + a = ^ a n /3 (2) (t - n). (5.64) 

To prove polynomial reproduction by B-splines, we can use the derivative 
formula ( 15.55,1 ) and the integral formula ( J5.59I ). We claim that 

k 

^n k f3 {N \t-n) = J]M £ , fc = 0, 1, ..., N. (5.65) 

n e=o 

Using the derivative formula k times shows that the linear combination is at most 
a polynomial of degree k, while using the integral formula k times shows that the 
result is a polynomial of degree k (see Exercise 1 5.1 5 j ). Implicitly, we use the following 
result: 



Proposition 5.6 (Reproduction of identity) Let tp(t) e C 1 ^) and let 
ipi(t) be its periodized version with period 1 as in ( 13.40) . Then, the following 
form a Fourier-transform pair: 

<p x (t) = ^2<p(t-n) = 1 ^ $(27rfc) = S k . (5.66) 

nGZ 



Proof. Since <pi(t) is periodic, it can be represented as a Fourier series ( |3,90b| , 
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with coefficients from ( |3,90af 
,1/2 



<I>„. - / "%i(t)e- i(a " /T) **dt ( = 5 f 1/2 (V^i-n)) e~^ /T)kt dt 

■'-V2 •'-1/2 \neZ / 

„* /-1/2 , . r l/2-n 

( =' W >p(t-n)e-^ /T)kt dt ( => W ^ 

„ =7 ,i-l/2 _ = „y_l/ 2 -n 



(d) 



V?(r) e^' 2 "'" dr S $(27rA;), 



where (a) follows from the definition of tpi in (15.66J) ; in (b) we interchanged the sum 
and integral; (c) follows from the change of variable r — t — n; (d) from periodicity 
of e~ 3 * ' and we also merged all individual unit-length intervals into a single interval 
over K; and (e) is the Fourier transform evaluated at uo — 2nk. If <pi(t) — 1, then 
$i,fc = &(2nk) = 5 k ; conversely, if $(27rfc) = $i, fe = 5 k then <fi(t) — 1. 

An immediate consequence of this proposition is that for B-splines, 

£/J W (t-n) = 1, (5-67) 



which follows from ( ]5.31b[ ). A generalization of the above result is the following 
theorem: 



Theorem 5.7 (Polynomial reproduction (Strang-Fix)) Consider 


a func- 


tion tp(t) and an integer K. If tp(t) has sufficient localization, 




/•OO 

/ (l + \t\ K )\p(t)\dt < OO, 

J — 00 


(5.68) 


then the following are equivalent: 




(i) Polynomials of degree k < K are in the span of {tp(t — n)} n6 z. 




(ii) The Fourier transform of <p(t), $(u>) and its K derivatives satisfy 




$ w (27rf) = 6 k 6 e , fc = 0, 1, ..., K, £eZ. 


(5.69) 



The proof of the theorem can be found in the book of Strang and Fix, see Further 
Reading. Solved Exercise 15.2 proves the theorem when ip(t) is an interpolating 
function, that is, when 

ip(n) = S n , ne Z. (5.70) 

In the statement of the theorem, the localization property (5.68) is trivially satisfied 
by any ip(t) having finite support; for infinitely-supported </?(£), however, sufficient 
decay is required. B-splines of order N have exactly Afth-order zeros at nonzero 
multiples of 2n since they are the product of N sine functions (see ( ]5.31b| )). There- 
fore, they reproduce polynomials of degree up to N, as we had seen earlier. 
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Example 5.14 (Polynomial reproduction) The quadratic B-spline has Fourier 
transform (from ( ]5.31b[ )) 



B^H = (-c(f)) : 



Let us check the second condition in the theorem, (5.69J ), for k = 0, 1, 2. For 
k = 0, we simply have the sine function to the third power, so clearly 

B {2) {2i:£) = (sinc(7r£)) 3 = S e . 

For the first derivative k = 1, we get 



d 
du> 



B&(u 



2 7T i 



d 
du) 



(sine (tt£)) — (sine (tt£)) 
dio 



2-n-l 



I 


sine 


(1) 


\2 d 

/ duj 


sine 


(I)) 


2-nt 


= 


0, 








(5 


.71) 



because sinc(w/2) is symmetric around 0, and thus its derivative is at the origin 
sine (tt£) is non: 



where sine (n£) is nonzero. Taking one more derivative of ( 15.71) leads to 
d 2 



d (3 ( fui 
— — sine — 
du \ 2 V V 2 



2 d ( fuj 
—— sine — 
du\ \2 



3 fuj 

— sine — 
2 V2 



d 2 



ui\ a ( fuj 
2SmC ^2)d^{ S ' mC {2 



d 
du> 



(sine (I)) 



0, 



(5.72) 



by the same arguments as before. Then, the theorem states that the linear 
combinations of {(3^ 2 '{t — n)}„ g z can reproduce polynomials up to degree 2, as 
we had already seen in Example 1 5 . 1 3 1 using a time-domain argument. 

5.4 Approximation of Functions and Sequences by 
Series Truncation 

In our discussion of sampling expansions in Chapter \4\, sampling followed by in- 
terpolation creates an approximation in a predetermined subspace; that subspace 
S is determined by the interpolation operator, and the manner of projecting to S 
as orthogonal to a subspace S is determined by the sampling operator. The re- 
construction is a series expansion using a basis for S. This setting was specifically 
geared at sampling/interpolation systems used in analog to digital conversion, and 
the subspace S is therefore typically a shift-invariant subspace. 

We will now generalize this setting to consider approximation in any basis, 
orthogonal or biorthogonal, and even in frames. Given a class of functions or se- 
quences, which are good bases to use in the approximation? How about the approx- 
imation method as well as the measure of approximation? We consider answers to 
these questions for linear and nonlinear approximations, in deterministic as well as 
stochastic settings. In all cases, approximation is by series truncation; the series in 
questions might be reordered initially and then truncated. 
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5.4.1 Linear Approximation 

Consider a space S, for which we have an orthonormal basis {<pk}k<£N- Given any 
x € S, we can write it as the expansion (1.85a) . 

x = >,(x, (fik)<fik- (5.73) 

ken 

Consider now an M-term approximation, Xm, a s the orthogonal projection onto 
Sm, the space spanned by the first M basis vectors. The approximation, difference 
and the quadratic approximation error are 



Af-i 



xm 



Y (x, <Pk)<Pk, (5.74a) 

fe=0 

00 
d M = x - x M = Y ( x > fk)V>k, (5.74b) 

k=M 

00 

4/ = \\d M \\ 2 = \\x-x M \\ 2 = Y, \( x ><Pk)\ 2 - ( 5 - 74c ) 

k=M 

From the projection theorem, Theorem 11.261 we know that the error of the approx- 
imation is orthogonal to the approximation itself, 

d,M -L xm, (5.75a) 

ll^f = \\x M f + \\d M \\ 2 . (5.75b) 

Our statements so far where about orthonormal bases; with care, they can be 
extended to the biorthogonal ones (see Exercise 15.161 ) . 

The expressions above are all deterministic. A given x is approximated in 
a fixed basis with a deterministic algorithm, independent of x. Classes of objects 
are functions with certain characteristics, such as continuous functions. A stochas- 
tic version simply considers the approximation of the stochastic processes x as in 
( 15.74a) , leading to an expected quadratic error, 

E[e 2 M ] = E[\\x-x M \\ 2 ]. (5.76) 

5.4.2 Nonlinear Approximation 

Among a vast set of possible approximations beyond the linear case seen above, we 
consider a nonlinear approximation method in bases. Consider again a space S 7 for 
which we have an orthonormal basis {vfc}feeN, such that ( 15.73) holds for all x G S. 
Now, suppose we can choose any M terms in the expansion ( 15.73) , not necessarily 
the first M as in ( 1 5. 74a) . As we are expanding in an orthonormal basis, it is not 
hard to see that we should choose the M largest-magnitude coefficients so as to 
minimize the quadratic approximation error. 

Start by creating an ordered sequence {ak n } of the expansion coefficients 

a k = (x, (fk), 
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such that \ctk n | > |<2fc„ + i | for n £ N (when the inequality is not strict, the ordering is 
not unique). To choose the M largest-magnitude coefficients, define the set Im{x), 

1m{x) = {ko, ki, ..., 1cm-i}, (5-77) 

where we made explicit that this set depends on x. Because of our choice of Xm(x), 

E K| 2 > E K| 2 (5.78) 

where Jm is any other set of M indices. Call xm and x' M the projections of x onto 
the spaces spanned by {ip k } k ex M and {<Pk}kej M , respectively, 



x M 



E (x, tfik)<Pk, (5.79a) 



kei 



E (x, ifik)<Pk- (5.79b) 

keJ M 



Then xm is the best M-term approximation (possibly nonunique) of x. In 
other words, 

\\x-x M \\ 2 < \\x-x' M f. (5.80) 

This is equivalent to 

E ki 2 < E ki 2 ( 5 - 81 ) 

since we are in an orthonormal expansion. This is because 



Ml 2 = E ki 2 + E 
= E ki 2 + E 

n£j M n£J M 



a n \ 2 



\a n \ 2 . 



From now on, we use the set Xm{x) in (5.77J ) to obtain the best nonlinear approx- 
imation xm in (5.79a) . The difference and the quadratic approximation error are 

d-M = x-x M = ^ (x, (fi k )ip k , (5.82a) 

e 2 M = \\d M \\ 2 = \\x-xm\\ 2 = E K*.^)l a - ( 5 - 82b ) 

As before, the error of the approximation is orthogonal to the approximation itself, 

d,M -L xm, (5.83a) 

\\xf = Pm|| 2 + |Mm|| 2 . (5.83b) 

This approximation method is nonlinear because if xm and yM are the approxima- 
tions of x and y according to (5.79a) , then the approximation of x + y is in general 
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(a) Linear approximation. 



(b) Nonlinear approximation. 



Figure 5.25: One-dimensional approximation in M 2 . The orthonormal basis is tpo — 
[1 l] T /\/2 and ipi — [—1 1] T j\f2. (a) Linear approximation by keeping only the first 
coefficient, or S — capo- (b) Nonlinear approximation by keeping the largest-magnitude 
coefficient. The first and third quadrants will be approximated by ipo, the rest by ipi. 



not equal to xm + Vm- This is easy to see, because the sets of largest-magnitude 
coefficients are usually different, 

l M {x) + 1 M {y), 

that is, the approximation depends on the signal we wish to approximate. The 
difference between linear and nonlinear approximations is illustrated in Figure 5.25J 
see also Exercise 15.171 

Interestingly, the difference between linear and nonlinear approximations can 
be substantial, depending on the class of signals and the type of basis. As we know, 
Fourier bases are not good at approximating signals containing discontinuities, be it 
using linear or nonlinear approximation, due to the Gibbs phenomenon. Changing 
the representation basis to wavelets, and using nonlinear instead of linear approxi- 
mation totally changes the game, as illustrated in Figure 15.5 (linear approximation 
using Fourier series) and Figure [5T61 (nonlinear approximation using Haar basis). 

5.4.3 Approximation in Fourier Bases 

When approximating a function x(t) either by linear or nonlinear approximation, 
the key is the decay of the expansion coefficients, either in their natural ordering in 
the linear case, or reordered given by Im{x) in ( 15.77,1 ) in the nonlinear case. 

As we have see for both the Fourier transform in ( 13.79) and Fourier series 
in ( 13.116) , smoothness of the time-domain function and the decay of the Fourier 
transform or Fourier series coefficients are closely connected. We focus here on 
Fourier series, since it provides an orthonormal basis for one period of periodic 
functions. The condition in ( 13.116) is equivalent to 



E 

fcez 



k\ p \X k \ < oo & 



\X h 



< 



1 



1 + \k\P +1 +" 



(5.84) 



or, x(t) is p-times continuously differentiable. The converse, similarly to ( 13.79c) , is 
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that if x(t) is p-times continuously differentiable, then Xk decays at least as 

1**1 < TT^pr- ^ 

These decay rates allow us to bound the approximation error in the Fourier series, 
as we now illustrate. 

Example 5.15 (Linear and nonlinear approximation in Fourier series) 
Let x(t) be the box function of period 1, width 1/2, norm 1, and centered at the 
origin. According to Table [3T61 x(t) and its Fourier series are given by 

f' '^ 1/4; ^ ^sinc(. fc /2). 

0, otherwise, v 2 

This Fourier series is symmetric around the origin, and all even terms (except at 
the origin) are zero, 



X = V2 ... -± i 

37T IT 



I -f 



Using linear approximation, we keep the central M = AK — 1 terms, with 
the quadratic error as 



— V - 

£ M - n 2 2^ (2k + l) 2 ' 



,2 

7T 2 

\k\>K 



By approximating this sum with an integral, we see that the quadratic error 
decays as j/M. Choosing nonlinear approximation instead, we can skip all the 
zero terms; this will improve the approximation constant, but not the order, as 
the quadratic error still decays as 71 /M. 

In general, the quadratic error for an M-term approximation of the Fourier series of 
p-times differentiable x(t), whose Fourier series decays as in ( I5.85J ), is of the order 

00 

72 73 



£i 



^ + | fc |2p+2 M 2 P+ 1 ' 

Moreover, this holds both for linear and nonlinear approximation, as we have just 
seen in Example 15.151 



5.4.4 Karhunen-Loeve Transform 

A cornerstone result for the linear approximation of stochastic processes is the opti- 
mality of the Karhunen-Loeve transform. The result can be derived in many forms, 
but they all hinge on calculating the eigendecomposition of the autocorrelation of 
the process, and keeping the largest eigenvalues in that decomposition. 
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KLT for Random Vectors Let x = [xo xi ... xa?_i] be an iV-dimensional 
random vector. For convenience, we assume all x^ to have mean zero, and we denote 
the autocorrelation matrix of x by 



E[xx*] 



(5.86) 



We want to answer the following question: What is the best orthonormal basis 
for C^, for which a linear approximation on a subspace of any M < N minimizes the 
expected quadratic error? In other words, search for a set of orthonormal vectors 
{<fik}k=o forming a basis for C , such that the quadratic error between x and its 
approximation x, given below, is minimized, 



JV-l 



£< x < 

fc=0 



<Pk)Vk, 



M-\ 

Y (x, <p k )>fk- 

fe=0 



(5.87) 



This quadratic error is 



L M 



(a) 



(6) 



E[||x-xf] = E[(x-x,x-x>] 

JV-l JV-l 

E (Y ( x , <Pk)¥k, Y ( X ' Vrn)Vm) 



E 



k=M 

JV-l JV-l 



--M 



(c) 



(<0 



(e) 



E 



E 



Y Y ^ X ' <Pk)<Pk, (X, Pm)<Pm) 
e=M m=M 
N-l N-l 

Y z2 ^ x ' v/c)(x, ip m )*{ipk, <p m ) 

t=M m=M 
N-l N-l 

Y Y ( X ' <Pk){<Pm, x)S k - 

c=M m=M 
N-l 

Y ( X ' <Pk){<Pm, X 
JV-l 

= Y ^k A ^k = Y ( A ^ k > ^ fc )> 



E 

. fe=M 
JV-l 



(/) 



E 



JV-l 

Y ^fe xx V/c 

k=M 



(g) 



cplE[xx*]ip k 



(5.88) 



k=M 



k=M 



where (a) follows from (5.87) ; (b) from the linearity of the inner product; (c) from 
linearity of the inner product in the first argument and conjugate linearity in the 
second argument; (d) from the orthonormality of the basis; (e) from the sum in m 
being nonzero only for m = k; in (f) we wrote the inner products in vector notation; 
(g) from the linearity of expectation; (h) from the definition of the autocorrelation 
matrix ( 15.86) ; and in (g) we used inner-product notation again. 



Theorem 5.8 (Karhunen-Loeve transform) For 1 < M < N, the expected 
error between X and its linear approximation on a subspace of dimension M 
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is minimized by the basis {ipk}^— 
decreasing eigenvalues, 


consisting of eigenvectors of A ordered by 


Aip k = \k<fik, 


fc = 0, 1, ..., N-l, (5.89a) 
k = Q, 1, ..., JV-2. (5.89b) 



Proof. We briefly sketch the proof. Minimizing ( 15.88]) is equivalent to maximizing 



Y2 {A<Pk, (Pk) 



(5.90) 



because the basis is orthonormal. Since A is positive semidefinite, all its eigenvalues are 
real and nonnegative. Assume for simplicity that the eigenvalues are distinct. Starting 
with M — 1, choose tpo such that 



fa 



argmax(Aco , fo), 

lkoll = l 



that is, the eigenvector corresponding to the largest eigenvalue. Assuming the first 
K vectors have been chosen so as to maximize ^2 k S {Aipk, fk), how do we choose 
fn7 It has to be of norm 1 and orthogonal to the span({^o, fi, • • • , fK-i}- This 
maximization leads to the next eigenvector with the largest eigenvalue. If eigenvalues 
are not distinct, then one can take any norm-1 linear combination of the eigenvectors 
corresponding to multiplicities. 



KLT for WSS Processes We now extend the idea of the KLT from random vectors 
to WSS processes. Assume x„ is WSS with zero mean and autocorrelation a n with 
sufficient decay so that a € £ X (Z). The power spectral density is (see ( 12.232) ) 

DTFT 



A(e 



.1^) 



(5.91) 



which is positive semidefinite. Here, the autocorrelation matrix A is an infinite- 
dimensional Toeplitz matrix with diagonals given by a n , 



E[xx* 









... a Q 
... ai 
... a 2 


a\ 


a 2 ... 
Ol ... 
a ... 


a Q 


Ol 









(5.92) 



This Toeplitz operator has a continuous power spectral density given by A(e JU ) 
(rather than a discrete set of eigenvalues), and under the ^ X (Z) assumption, the 
eigensequences are the DTFT sequences e Jwfc , k G Z, ui € [0, 27r). The subspace 
approximation of size M in M. N becomes a subset approximation of the spectrum, 
namely a set S C [0, 2tt) such that 

\S\_M 
2^ ~ ~/V ' 
w € S if A(e? u ') > A(e ju ), forw S. 



S 



such that 
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Figure 5.26: KLT for an AR-1 process is the truncation of the monotonically decreasing 
power spectral density A y (<e? u ) from ( |5.93D to [— tt/2,tv/2). 



We illustrate this on an AR-1 process. 

Example 5.16 (KLT FOR an AR-1 process) Take an i.i.d. process x„ as in- 
put to an AR-1 model with transfer function 1/(1 — az^ 1 ). According to ( 12.234J ) . 
the power spectral density of the filtered process y n is 

AJen - = 



(1 -ae-i u )(l -aei") l + a 2 -2acosw 

(see Example 14.25) . Choose a = 0.9. The power spectral density is 

AJe ]UJ ) = (5.93) 

yv ; 1.81- 1.8cosw v ' 

and varies from 100 at u> = to (3. 61)^ sa 0.277 at u> = ir. Since the power 
spectral density is monotonically decreasing, it is easy to find a set S of any 
given size. Choose \S\ = tt. The KLT is the projection of y n onto BL[— 7r/2, 7r/2], 
which we have seen in Example 14.251 see Figure 5.261 

5.5 Compression and Transform Coding 

The previous sections concentrated on approximating a given function or a sequence 
by using a subset of expansion coefficients. In this section, we address issue (iii) 
from the beginning of this chapter, that is, what to do when the chosen expansion 
coefficients require too much storage or bandwidth for transmission. In other words, 
we compress. 

Everyday compression problems are unmanageable without a divide and con- 
quer approachQ Effective compression of images, for example, depends on the ten- 
dencies of pixels to be similar to their neighbors, or to differ in partially predictable 
ways. These tendencies, arising from the continuity, texturing, and boundaries of 
objects, the similarity of objects in an image, gradual lighting changes, an artist's 



94 Divide and conquer approach is central to many endeavors, ranging from children managing in- 
consistent parents to colonial powers controlling native peoples. In engineering and computational 
science, it means breaking a big problem into smaller problems that can be more easily understood 
and solved. Putting the pieces back together gives a modular design, which is advantageous for 
implementation, testing, and component reuse. 
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technique and color palette, etc., may extend over an entire image with a quarter 
million pixels. Yet the most general way to utilize the probable relationships be- 
tween pixels (later described as unconstrained source coding) is infeasible for this 
many pixels. In fact, 16 pixels is a lot for an unconstrained source code. 

To conquer the compression problem — allowing, for example, more than 16 
pixels to be encoded simultaneously — state-of-the-art lossy compressors divide the 
encoding operation into a sequence of three relatively simple steps: the computation 
of a linear transformation of the data designed primarily to produce uncorrelated 
coefficients, separate quantization of each scalar coefficient, and entropy coding. 
This process is called transform coding. In image compression, a square image with 
N pixels is typically processed with simple linear transforms (often the DCTs or 
DWTs) of size y/N . 

This section explains the fundamental principles of transform coding; these 
principles apply equally well to images, audio, video, and various other types of 
data, so abstract formulations are given. For more details, see Further Reading. 

Source Coding Source coding means to represent information in bits, with the 
natural aim of using a small number of bits. When the information can be exactly 
recovered from the bits, the source coding or compression is called lossless; oth- 
erwise, it is called lossy. The transform codes in this section are lossy. However, 
lossless entropy codes appear as components of transform codes, so both lossless 
and lossy compression are of present interest. 

In our discussion, the "information" is denoted by a real column vector x € M. N 
or a sequence of such vectors. A vector might be formed from pixel values in an 
image or by sampling an audio signal; KN pixels can be arranged as a sequence 
of K vectors of length N . The vector length N is defined such that each vector in 
a sequence is encoded independently. For the purpose of building a mathematical 
theory, the source vectors are assumed to be realizations of a random vector x with 
a known distribution. The distribution could be purely empirical. 

A source code is comprised of two mappings: an encoder and a decoder. The 
encoder maps any vector x £ 1^ to a finite string of bits, and the decoder maps 
any of these strings of bits to an approximation x £ M. N . The encoder mapping can 
always be factored as 7 o /3, where (3 is a mapping from l w to some discrete set 
X and 7 is an invertible mapping from X to strings of bits. The former is called a 
lossy encoder and the latter a lossless code or an entropy code. The decoder inverts 
7 and then approximates x from the index (3(x) € X. This is shown in the top half 
of Figure 15.271 It is assumed that communication between the encoder and decoder 
is perfect. 

To assess the quality of a lossy source code, we need numerical measures of 
approximation accuracy and description length. The measure for description length 
is simply the expected number of bits output by the encoder divided by N; this is 
called the rate in bits per scalar sample and denoted by R. Here we will measure 
approximation accuracy by squared Euclidean norm divided by the vector length: 

1 1 W_1 

d(x,x) = j^\\x-x\\ 2 = — ^J (x„ - x n f. 

n=0 
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Figure 5.27: Any source code can be decomposed so that the encoder is 7 o /3 and the 
decoder is P o j~ , as shown at top. 7 is an entropy code and j3 and P are the encoder 
and decoder of an ^-dimensional quantizer. In a transform code, P and P each have a 
particular constrained structure. In the encoder, P is replaced with a linear transform 
<f> and a set of N scalar quantizer encoders. The intermediate a,'s are the expansions 
coefficients, here called here transform coefficients. In the decoder, P is replaced with N 
scalar quantizer decoders and another linear transform $. Usually <f> — "J -1 . [TfBD: Split 
into parts (a) and (b).] 



This accuracy measure is conventional and usually leads to the easiest mathematical 
results, though the theory of source coding has been developed with quite general 
measures [10J . The expected value of ri(x, x) is the MSE distortion from ( j 1.65ft and 
is denoted by D = E[d(x,x)]. The normalizations by N make it possible to fairly 
compare source codes with different lengths. 

Fixing N, a theoretical concept of optimality is straightforward: A length- N 
source code is optimal if no other length- N source code with at most the same rate 
has lower distortion. This concept is of dubious value. First, it is very difficult to 
check the optimality of a source code. Local optimality — being assured that small 
perturbations of (3 and (3 will not improve performance — is often the best that can 
be attained. Second, and of more practical consequence, a system designer gets to 
choose the value of N . It can be as large as the total size of the data set — like the 
number of pixels in an image — but can also be smaller, in which case the data set 
is interpreted as a sequence of vectors. 

There are conflicting motives in choosing TV. Compression performance is 
related to the predictability of one part of x from the rest. Since predictability can 
only increase from having more data, performance is usually improved by increasing 
N. The conflict comes from the fact that the computational complexity of encoding 
is also increased. This is particularly dramatic if one looks at complexities of optimal 
source codes. The obvious way to implement an optimal encoder is to search through 
the entire codebook, giving running time exponential in N. 

State-of-the-art source codes result from an intelligent compromise; instead 
of attempting to realize an optimal code for a given value of N whose encoding 
complexity would force a small value for N, source codes that are good, but not 
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optimal, are used. Their lower complexities make much larger AT's feasible. The 
paradoxical conclusion is that the best codes to use in practice are suboptimal. 

Constrained Source Coding Transform codes are the most used source codes be- 
cause they are easy to apply at any rate and even with very large values of N . 
The essence of transform coding is the modularization shown in the bottom half 
of Figure 15.271 The mapping /3 is implemented in two steps. First, an invertible 
linear transform of the source vector x is computed, producing a = <&x. These are 
the expansions coefficients we have seen earlier; in coding theory, they are called 
transform coefficients. The N expansion coefficients are then quantized indepen- 
dently of each other by N scalar quantizers. This is called scalar quantization since 
each scalar component of a is treated separately. Finally, the quantizer indices that 
correspond to the transform coefficients are compressed with an entropy code to 
produce the sequence of bits that represent the data. 

To reconstruct an approximation of x, the decoder essentially reverses the 
steps of the encoder. The action of the entropy coder can be inverted to recover the 
quantizer indices. Then the decoders of the scalar quantizers produce a vector y 
of estimates of the expansion coefficients. To complete the reconstruction, a linear 
transform is applied to y to produce the approximation x. This final step usually 
uses the transform $ _1 , but for generality the transform is denoted $. 

Most source codes cannot be implemented in the two stages of linear transform 
and scalar quantization. Thus, a transform code is an example of a constrained 
source code. Constrained source codes are, loosely speaking, source codes that are 
suboptimal but have low complexity. The simplicity of transform coding allows 
large values of N to be practical. Computing the transform $ requires at most 
N 2 multiplications and N(N — 1) additions. Specially structured transforms, such 
as the DFT, DCT, or DWT, are often used to reduce the complexity of this step, 
but this is merely icing on the cake. The great reduction from the exponential 
complexity of a general source code to the (at most) quadratic complexity of a 
transform code comes from using linear transforms and scalar quantization. 

5.5.1 Transform Coding 

The standard theoretical model for transform coding looks like the bottom of Fig- 
ure 15.271 It has the strict modularity shown, meaning that the transform, quanti- 
zation and entropy coding blocks operate independently. In addition, the entropy 
coder can be decomposed into N parallel entropy coders so that the quantization 
and entropy coding operate independently on each scalar transform coefficient. 

We start by briefly describing the fundamentals of entropy coding and quan- 
tization to provide background for our later focus on the optimization of the trans- 
form, and then address the allocation of bits among the N scalar quantizers. 

Entropy Coding 

Entropy codes are used for lossless coding of discrete random variables. Consider 
the discrete random variable x with alphabet 1. An entropy code 7 assigns a unique 
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binary string, called a codeword, to each i € X (see Figure [5.271 ) . 

Since the codewords are unique, an entropy code is always invertible. How- 
ever, we will place more restrictive conditions on entropy codes so they can be 
used on sequences of realizations of x. The extension of 7 maps the finite se- 
quence \x\ X2 ■ ■ ■ XjA to the concatenation of the outputs of 7 with each input, 
7(2:1)7(^2) • • ■ l{xk)- A code is called uniquely decodable if its extension is one- 
to-one. A uniquely decodable code can be applied to message sequences without 
adding any "punctuation" to show where one codeword ends and the next begins. 
For example, in a prefix code, no codeword is the prefix of any other codeword. 
Prefix codes are guaranteed to be uniquely decodable. 

A trivial code numbers each element of I with a distinct index in {0, 1, . . . , \T\ — 
1} and maps each element to the binary expansion of its index. Such a code requires 
[log 2 \T\~\ bits per symbol. This is considered the lack of an entropy code. The idea 
in entropy code design is to minimize the mean number of bits used to represent 
x at the expense of making the worst-case performance worse. The expected code 
length is given by 

L( 7 ) = E[*( 7 (x))] = £P x (fK(7»), 

iei 

where P x {i) is the probability of symbol i and £(j(i)) is the length of 7(2). The 
expected length can be reduced if short code words are used for the most probable 
symbols, even if that means that some symbols will have codewords with more than 
[log 2 \T\] bits. 

The entropy code 7 is called optimal if it is a prefix code that minimizes L(j). 
Huffman codes, described shortly, are examples of optimal codes. The performance 
of an optimal code is bounded by 

H(x) < L(j) < ff(x) + l (5.94) 

where 

H(x) = -^P x (j)log 2 P x (z) (5.95) 

iei 

is the entropy of x. 

The up to one bit gap in ( 15. 94^ is ignored in the remainder of the section. If 
H (x) is large, this is justified simply because one bit is small compared to the code 
length. Otherwise note that L(j) ss H(x) can attained by coding blocks of symbols 
together; this is detailed in any information theory or data compression textbook. 

Huffman Coding There is a simple algorithm, due to Huffman [78] , for construct- 
ing optimal entropy codes. One starts with a graph with one node for each symbol 
and no edges. These nodes will become the leaves of a tree as edges are added to 
make a connected graph. 

At each step of the algorithm, the probabilities of the disconnected sets of 
nodes are sorted and the two least probable sets are merged through the addition 
of a parent node and edges to each of the two sets. The edges are assigned labels 
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7(1) = 00, pi = 0.3 












-• 3-> 


7(3) = 010,p 3 = 0.14 




7 (4) =011, p 4 = 0.13 










7(2) = 10, p 2 = 0.26 












-* 5-> 
~-» 6-> 


7(5) = 110, p 5 = 0.09 




7(6) = 111,P6 = 0.08 









Figure 5.28: Huffman code. 



of and 1 . When a tree has been formed, codewords are assigned to each leaf node 
by concatenating the edge labels on the path from the root to the leaf. 

Example 5.17 (Huffman coding) Figure [5T281 shows a Huffman code tree for 
symbols {1, 2, 3, 4, 5, 6} with respective probabilities {0.3, 0.26, 0.14, 0.13, 0.09, 
0.08}. The codewords are boxed. Computing a weighted sum of the codeword 
lengths gives the expected code length 



L 



0.3 • 2 + 0.26 ■ 2 + 0.14 • 3 + 0.13 ■ 3 + 0.09 • 3 + 0.08 • 3 



2.44 bits. 



This is quite close to the entropy of 2.41 bits obtained by evaluating ( J5.95J ). 



Quantization 

A quantizer q is a mapping from a source alphabet M. N to a reproduction codebook 
C = {xi\i(zx C WL N , where X is an arbitrary countable index set. Quantization 
can be decomposed into two operations q = (3 o /?, as shown in Figure 15.271 The 
lossy encoder j3 : M. N — > X is specified by a partition of M. N into partition cells 
Si = {x G R w | (3(x) = i}, i € X. The reproduction decoder : X — » M. N is specified 
by the codebook C. If N = 1, the quantizer is called a scalar quantizer; for N > 1, 
it is a vector quantizer. 

The quality of a quantizer is determined by its distortion and rate. The MSE 
distortion for quantizing random vector x € l w is 



D 



1 
N 



E 



1W 



The rate can be measured in a few ways. The lossy encoder output /3(x) is a 
discrete random variable that is typically entropy coded because the output symbols 
will have unequal probabilities. Associating an entropy code 7 to the quantizer 
gives a variable-rate quantizer specified by (/?, /3, 7). The rate of the quantizer is 
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Quantizer Rate R 

Variable rate N~ L("/) 

Fixed rate A r_1 log 2 |X| 

Entropy constrained N~ H(/3(x)) 

Table 5.2: Rates for different quantizers. 



the expected code length of 7 divided by N. Not specifying an entropy code (or 
specifying the use of fixed-rate binary expansion) gives a fixed-rate quantizer with 
rate R = N~ 1 log 2 \1\. Measuring the rate by the idealized performance of an 
entropy code gives R = N~ 1 H((3(~x)); the quantizer in this case is called entropy- 
constrained. These rates are summarized in Table 15.21 

While the optimal performance of entropy-constrained quantization is better 
than that of variable-rate quantization as well as fixed-rate quantization, it adds 
complexity, and variable-length output can create difficulties such as buffer over- 
flows. Furthermore, entropy-constrained quantization is only an idealization since 
the code will in general not meet the lower bound in ( 15.94J ). 

Optimal Quantization An optimal quantizer is one that either minimizes the dis- 
tortion subject to an upper bound on the rate, or minimizes the rate subject to an 
upper bound on the distortion. Because of simple shifting and scaling properties, 
an optimal quantizer for a random variable x can be easily deduced from an optimal 
quantizer for the normalized random variable (x— /U X )/<7 X , where /i x and er x are the 
mean and standard deviation of x, respectively. One consequence of this is that 
optimal quantizers have performance 

D = o*g(R), (5.96) 

where g(R) is the performance of optimal quantizers for the normalized source. 
Equation ( 15.961 ) holds, with a different function g, for any family of quantizers 
that can be described by its operation on a normalized variable, not just optimal 
quantizers. 

The rate measure affects the optimal encoding rule because (3(x) should be the 
index that minimizes a Lagrangian cost function including both rate and distortion. 
Only for fixed-rate quantization does the optimal encoding rule simplify to finding 
the index corresponding to the nearest codeword. 

In some of the more technical discussions that follow, one property of optimal 
decoding is relevant: The optimal decoder f3 computes 

/3(i) = E[x|xeSi], 

which is called centroid reconstruction. The conditional mean of the cell, or centroid, 
is the minimum MSE estimate (see Section [1.4.4} ). 

High-Resolution Quantization For most sources, it is impossible to analytically 
express the performance of optimal quantizers. Thus, aside from using ( 15.961 ), ap- 
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proximations must suffice. Fortunately, approximations obtained when it is assumed 
that the quantization is very fine are reasonably accurate even at low to moderate 
rates. 

High-resolution analysis is based on approximating the PDF / x on the interval 
Si by its value at the midpoint. Assuming f x is smooth, this approximation is 
accurate when each Si is short. Optimization of scalar quantizers then turns into 
finding the optimal lengths for the Si's, depending on the PDF / x . 

The performance of optimal fixed-rate quantization is approximately 

D « UffJ\t)dt)\-™, (5.97) 



For a Gaussian random variable whose PDF was given in ( 11.236) yields 

D w ^l a 2 2- 2R . (5.98) 



For entropy-constrained quantization, high-resolution analysis shows that it is 
optimal for each Si to have equal length. A quantizer that partitions with equal- 
length intervals is called uniformed The resulting performance is 

D re — 2 2h ^ 2- 2R , (5.99) 

where 

ft(x) = - f f x (t)log 2 f x (t)dt 

JS. 

is the differential entropy of x. For a Gaussian random variable, (5.99) simplifies to 

D w — a 2 2~ 2R . (5.100) 

Summarizing (5.97) -( [5.100) , we see that high- resolution quantizer performance is 
described by 

D re ca 2 2- 2R , (5.101) 

where a 2 is the variance of the source and c is a constant that depends on the 
normalized density of the source and the type of quantization (fixed-rate, variable- 
rate or entropy-constrained). This is consistent with ( 15.96) . 

The computations we have made are for scalar quantization. For vector quan- 
tization, the best performance in the limit as the dimension N grows is given by 
the distortion rate function. For a Gaussian source, this bound is 

D = a 2 2' 2R . (5.102) 

The approximate performance given by ( 15. 1001 ) is only worse by a factor of 7re/6 
(re 1.53 dB). This can be expressed as a redundancy ilog 2 (^) ~ 0.255 bits. 
Furthermore, a numerical study has shown that for a wide range of memoryless 
sources, the redundancy of entropy-constrained uniform quantization is at most 0.3 
bits per sample at all rates. 

95 While the design of quantizers has a deep theory, the fact remains that: "Most quantizers 
today are indeed uniform and scalar, but are combined with prediction or transforms." [64] . 
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q(x) 



3A 

2A 



A -- 




q{x) 



lA |A 5 A 




(a) 

Figure 5.29: Uniform quantization, (a) Rounding to the nearest integer, (b) Rounding 
to the nearest half-integer. 



Uniform Quantization Though definitions of uniform quantization vary some- 
what, the archetype of rounding is always an example of uniform quantization. 
Shown in Figure 15.29( a) is the input-output relationship of a device that rounds to 
the nearest integer multiple of the step size A. To describe this in formal notation, 
the encoder could be (3(x) = round(x/A), where round( • ) denotes rounding to the 
nearest integer, with the corresponding decoder (3{i) = iA. 

The other common uniform quantizer is shown in Figure 5.29( b) . This is a 
shifted version of the previous uniform quantizer. Variations in the definition of 
uniform quantization sometimes allow only the encoder to have equal length cells 
or only the decoder to have evenly spaced outputs and may also allow the decoder 
outputs to be shifted from the centers of the partition cells. 

Uniform quantization of a uniform random variable provides a setting to see 
the typical trade-off between rate and distortion. Consider x uniformly distributed 
on the interval [0, 1) with the PDF as in ( jl.235a) . A fixed-rate uniform quantizer, 
as in Figure 5.29( b), with K cells and step size A = l/K, quantizes x € [(m — 
1)A, mA) to (m - i)A for m = 1, 2, . . . , K. It has rate R = log 2 N = - log 2 A 
and distortion Its rate and distortion are 



R 



log 2 A^ 
-l 



log 2 A, 



pi J * ■*■ />nzA 

D = / (x-xfdx = Y, / {x-{n-\)Afdx 

JO „_ n J(n-1)A 



1 



1 



— A z = — 2 



-2R 



(5.103a) 



(5.103b) 



12 12 

We see that when using the MSE distortion measure, the 2~ 2R factor will almost 
always be present. 



Bit Allocation 

Coding (quantizing and entropy coding) each expansion coefficient separately splits 
the total number of bits among the transform coefficients in some manner, implying 
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some sort of a bit allocation among the components. 

Bit allocation problems can be stated in a single common form: One is given 
a set of quantizers described by their distortion-rate performances as 

Di = gi (Ri), R z eTZ t , i = l,2, ...,N. 

Each set of available rates IZi is a subset of the nonnegative real numbers and may 
be discrete or continuous. The problem is to 

minimize average distortion D = N^ 1 ^2 n=0 D n , 

subject to maximum average rate R = N~ 1 ~^2 n=Q R n - 

If the average distortion can be reduced by taking bits away from one compo- 
nent and giving them to another, the initial bit allocation is not optimal. Applying 
this reasoning with infinitesimal changes in the component rates, a necessary condi- 
tion for an optimal allocation is that the slope of each g{ at Ri is equal to a common 
constant value. 

The approximate performance given by ( 15.1011 ) leads to a particularly easy bit 
allocation problem with 

g t = c l a]2-' 2R \ 7^ = [0,oo), i=l, 2, ...,7V. (5.104) 

Ignoring the fact that each component rate must be nonnegative, an equal-slope 
argument shows that the optimal bit allocation is 

With these rates, all the ZVs are equal and the average distortion is 

D = (III^ (III^ 2~™ (5.105) 

This solution is valid when each Ri given above is nonnegative. For lower rates, the 
components with smallest Cj of are allocated no bits and the remaining components 
have correspondingly higher allocations. 

Bit Allocation with Uniform Quantizers With uniform quantizers, bit allocation 
is nothing more than choosing a step size Aj for each of the N components. The 
equal-distortion property of the analytical bit allocation solution gives a simple 
rule: Make all of the step sizes equal. This will be referred to as lazy bit allocation. 
Our development indicates that lazy allocation is optimal when the rate is high. 
Numerical studies have shown that lazy allocation is nearly optimal as long as the 
minimal allocated rate is at least 1 bit. Entropy-constrained uniform quantization 
with lazy bit allocation is used in the numerical examples in the following section. 
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5.5.2 Optimal Transforms for Transform Coding 

We are now ready for the main event of designing the analysis transform $ and the 
synthesis transform $. Throughout this section the source x is assumed to have 
mean zero, and E x denotes the covariance matrix S[xx T ]. The source is often, but 
not always, jointly Gaussian. 

A signal given as a vector in M. N is implicitly represented as a series with 
respect to the standard basis. An invertible analysis transform $ changes the basis. 
A change of basis does not alter the information in a signal, so how can it affect 
coding efficiency? Indeed, if arbitrary source coding is allowed after the transform, 
it does not. The motivating principle of transform coding is that simple coding may 
be more effective in the transform domain than in the original signal space. In the 
standard model, simple coding corresponds to the use of scalar quantization and 
scalar entropy coding. 

Visualizing Transforms Beyond two or three dimensions, it is difficult to visualize 
vectors — let alone the action of a transform on vectors. Fortunately, we already have 
an idea of what a linear transform does: it combines rotating, scaling, and shearing 
such that a hypercube is always mapped to a parallelepiped. For example, in two 
dimensions, the level curves of a zero-mean Gaussian density are ellipses centered 
at the origin with collinear major axes, as shown in the left panels of Figure [5. 30l 96 l 
The middle panel of Figure [5.30( a) shows the level curves of the joint density of the 
expansion coefficients after an arbitrary invertible linear transformation. A linear 
transformation of an ellipse is still an ellipse, though its eccentricity and orientation 
(direction of major axis) may have changed. 

The grid in the middle panel indicates the cell boundaries in uniform scalar 
quantization, with equal step sizes, of the transform coefficients. The effect of 
inverting the transform is shown in the right panel; the source density is returned to 
its original form and the quantization partition is linearly deformed. The partition 
in the original coordinates, as shown in the right panel, is what is truly relevant. 
It shows which source vectors are mapped to the same symbol, thus giving some 
indication of the average distortion. Looking at the number of cells with appreciable 
probability gives some indication of the rate. 

A singular transform is a degenerate case. As shown in the middle panel of 
Figure [5.30( b). the transform coefficients have probability mass only along a line. 
(A line segment is an ellipse with unit eccentricity.) Inverting the transform is not 
possible, but we may still return to the original coordinates to view the partition 
induced by quantizing the expansion coefficients. The cells are unbounded in one 
direction, as shown in the right panel. This is undesirable unless variation of the 
source in the direction in which the cells are unbounded is very small. 

Although better than unbounded cells, the parallelogram-shaped partition 
cells that arise from arbitrary invertible transforms are inherently suboptimal (see 
Example 15. 18) . To get rectangular partition cells, the basis vectors must be orthog- 
onal, shown in Figure 15.30( c) for the KLT. For square cells, when quantization step 



96 Because the covariance matrix is symmetric, it has an orthogonal set of eigenvectors, and thus 
orthogonal principal axes. 



a3.0 [October 2011] CC by-nc-nd Comments to book-errata@FouricrAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



5.5. Compression and Transform Coding 



511 














































<T 














*m 































































ta 

2 I II 

o 




(b) 




-J -2 
2 -2 











































■> 


i! 


: 


( 




) 


; 


» 


^ 












































(c) 

Figure 5.30: Illustration of various basis changes. The Gaussian source is depicted 
by level curves of the PDF (left). The expansion coefficients are separately quantized 
with uniform quantizers (center). The induced partitioning is then shown in the original 
coordinates (right), (a) A basis change generally induces a non-hypercubic partition, (b) 
A singular transformation gives a partition with unbounded cells, (c) A Karhunen-Loeve 
transform aligns the partitioning with the axes of the source PDF. 



sizes are equal for each transform coefficient, the basis vectors should in addition 
to being orthogonal have equal lengths. 

Example 5.18 (Shapes of partition cells) The quality of a source code de- 
pends on the shapes of the partition cells {f3~ l {i), i £l} and on varying the sizes 
of the cells according to the source density. When the rate is high, and^either 
the source is uniformly distributed or the rate is measured by entropy (iT(/3(x))), 
the sizes of the cells should essentially not vary. Then, the quality depends on 
having cell shapes that minimize the average distance to the center of the cell. 

For a given volume, a body in Euclidean space that minimizes the average 
distance to the center is a sphere. But spheres do not work as partition cell 
shapes because they do not pack together without leaving interstices. Only for 
a few dimensions N is the best cell shape known. One such dimension is N = 2, 
where the hexagonal packing shown in Figure 15.7( b) is best. 

The best packings (including the hexagonal case) cannot be achieved with 
transform codes. Transform codes can only produce partitions into parallelepipeds, 
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as shown for N = 2 in Figure [5.301 The best parallelepipeds are cubes. We get 
a hint of this by comparing a rectangular partitions of a unit-area square as 
well as the square one shown in Figure [5771 (a). Both partitions have 36 cells, 
so each cell has the same area. The partition with square cells gives distor- 
tion 1/432 w 2.31 x 1(T 3 , while the other gives 97/31104 « 3.12 x 10" 3 . (The 
calculations are easy; see our discussion on uniform quantization.) 

This simple example can also be interpreted as a problem of allocating bits 
between the horizontal and vertical components. The lazy bit allocation arising 
from equal quantization step sizes for each component is optimal. This holds 
generally for high-rate entropy-constrained quantization of components with the 
same normalized density. 

Orthonormal Transforms with Gaussian Sources Consider a jointly Gaussian 
source, and assume $ = $ T , that is, the transform is orthonormal. The Gaus- 
sian assumption is important because any linear combination of jointly Gaussian 
random variables is Gaussian. Thus, any analysis transform gives Gaussian expan- 
sion coefficients. Then, since the expansion coefficients have the same normalized 
density, for any reasonable set of quantizers, ( 15.961 ) holds with a single function g(R) 
describing all of the expansion coefficients. Orthogonality is important because or- 
thogonal transforms preserve Euclidean lengths, which gives d(x,x) = d(y,y). 

With these assumptions, for any rate and bit allocation a KLT is an optimal 
transform: 



Theorem 5.9 Consider a transform coder with orthogonal analysis/synthesis 
transforms, $ = $ . Suppose there is a single function g to describe the quanti- 
zation of each expansion coefficient through 

E[(q 1 -S 1 ) 2 ] =afg(R l ), i = 1, 2, . . . , N, 

where af is the variance of on and Ri is the rate allocated to a,. Then, for any bit 
allocation (R\, R2, ■ ■ ■ , Rn), there exists a KLT that minimizes the distortion. 

0%) 



In the typical case where g is nonincreasing, a KLT that gives (<jf, erf, 
sorted in the same order as the bit allocation minimizes the distortion. 



Recall that with a high average rate of R bits per component and quantizer 
performance described by (5.1041 ), the average distortion with optimal bit alloca- 
tion is given by (5.105) . With Gaussian expansion coefficients that are optimally 
quantized, the distortion simplifies to 

D = cfnL^V'V 2 * (5.106) 



where c = (l/6)7re for entropy-constrained quantization or c = (l/2)vo7T for fixed- 
rate quantization. The choice of an orthogonal transform is thus guided by mini- 
mizing the geometric mean of the expansion coefficient variances. 
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Theorem 5.10 The distortion given by ( J5.106J ) is minimized over all orthogonal 
transforms by any KLT. 



Proof. Applying Hadamard's inequality that states that the absolute value of the de- 
terminant of a matrix is bounded from above by the product of the norms of the column 
vectors to E Q gives 

N 

(det$)(detE x )(det$ T ) = det £ Q < JJo-f. 

Since det«I> = 1, the left-hand side of this inequality is invariant under the choice of <F 
Equality is achieved when a KLT is used. Thus KLT minimizes the distortion. 

Equation ( 15.106J ) can be used to define a figure of merit called the coding gain. The 



coding gain of a transform is a function of its variance vector, \o~f a 
and the variance vector without a transform, diag(E 2: ): 



2 - - " ,\ _ ■ 



( lln=0 (^i)"» J 

coding gain = - 



1/N 



n„=o < 



The coding gain is the factor by which the distortion is reduced because of the 
transform, assuming high rate and optimal bit allocation. The foregoing discussion 
shows that KLTs maximize coding gain. 

5.6 Computational Aspects 

Many algorithms are associated to sampling, interpolation, quantization, and ap- 
proximation. We discuss a few representative examples. 

5.6.1 Optimal Quantization and Clustering 



In Section [5. 5. 11 we saw simple, uniform quantization. A better way to do quantiza- 
tion uses nonuniform intervals when the PDF of the random variable(s) is nonuni- 
form. 

Example 5.19 (Lloyd's algorithm) 

One reason for the popularity of DCT is the existence of a fast, iVlogiV 
algorithm for its computation. Because the DCT is a trigonometric transform, and 
resembles a real version of a DFT, it is not surprising that one can use the FFT to 
compute the DCT. 

5.6.2 Projection Onto Convex Sets 

Another example combines bandlimitedness and quantization using POCS. 
Example 5.20 (Oversampling and quantization) 
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Chapter at a Glance 

Approximation of Functions by Polynomials 



Method 



Approximation Approximating 
criterion polynomial Pk(*) 



Error 



Least-squares 
Lagrange 

Taylor series 
Minimax 



min \\x — Pk\\2 



^2(x, ip k )<p k {t) 



k=0 
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PK(t k )=x(t k ) y, x(t k ) Yl 
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i # k 
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Polynomials 



Definition 



Recursion 



Weight Interval 



Legendre L k (t) 



2 k k\ dt k 
Chebyshev T k (t) cos(fcarccost) 



<" 1)fc *(!-*)* ^fWt-^L* 



k k 

2tT k _ 1 — T k _ 2 



VT^T2 



Lagucrrc L k (t) 



Hermite H ( k a) (t) a,- 3k l 2 k\t k 



k\ dt k 

a~ 3k 

Lfe/2j 

E 



(e-'t fc ) 
k 

(-2t 2 /a)- e 
l\{k-2t)\ 



2k-\-t k - 1 

; L k _ 1 - — L k _ 2 e 

fc k 
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Approximation of Functions by Splines 



Method 



Approximation 



B-splines 



(t) = ^2(x(t),^ N Ht-k)) t ^ N \t-k) 

fcez 

1, |t| < 1/2; p. 



/3 (0) W 



0, otherwise, 



B(°)(W): 



^>(t) = ^- 1 »(t).(3 (0) (t), ~ B< N >(u,) = (sinc(-)) 
Orthogonalized splines x(t) = J^ (x(t), <^ (Ar) (* - fc))* ¥> (JV) (t - k) 



w\\ JV+i 



A.' CI. 



v (~)(t) = ^4 JV ) / 3( JV )(t-fc) 



Polynomial reproduction Pw(*) = /"J a* /3' N ' (t — fc) 
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Approximation of Functions and Sequences by Series Truncation 

Method Approximation 8jf(t) Coefficients used Error e M 




Historical Remarks 

One of the names appearing prominently in this chapter is that of Pafnuty 
Lvovich Chebyshev (1821-1894), considered to be the founding father 
of Russian mathematics. His contributions are many, in fields ranging 
from probability and statistics, to number theory. Chebyshev polynomi- 
als were described in this chapter for use in minimax approximation; they 
are also responsible for Chebyshev's name finding its way into signal pro- 
cessing, through the family of Chebyshev filters. As an interesting aside, 
a crater on the Moon was named after Chebyshev. 
The origins of splines are particularly interesting. They date back to ship building; 
naval engineers needed a method to thread a smooth curve through a given set of points. 
This resulted in thin wooden strips, splines, placed between pairs of points, ducks, rats, or 
dogs. The method was then used in both the aircraft as well as the automobile industry 
in the late 1950s and early 1960s. Engineers at Citroen, Renault and General Motors 
developed the theory further; in particular, Pierre Bezier, a French engineer working at 
Renault, became a leader in using mathematical and computational tools in design and 
manufacturing. With the advent of computers, splines took over from polynomials as a 
tool for interpolating functions. 

On the compression side, transform coding was invented as 
a method for conserving bandwidth in the transmission of sig- 
nals output by the analysis unit of a ten-channel vocoder (voice 
coder) [49]. These correlated, continuous-time, continuous-ampli- 
tude signals represented estimates, local in time, of the power in 
ten contiguous frequency bands. By adding modulated versions of 
these power signals, the synthesis unit resynthesized speech. The 
vocoder's ancestor, Pedro, the Voder, was presented at the 1939 
World's Fair. Kramer and Mathews [92] showed that the total 
bandwidth necessary to transmit the signals with a prescribed fi- 
delity can be reduced by transmitting an appropriate set of linear 
combinations of the signals instead of the signals themselves. This 

is not source coding because it does not involve discretization. Thus, one could ascribe a 
later birth to transform coding. Huang and Schultheiss [77] introduced the structure we 
called the standard model (bottom of Figure [5. 27j) . They studied the coding of Gaussian 
sources while assuming independent expansion coefficients and optimal fixed-rate scalar 
quantization. They first showed that U — T~ l is optimal and then that T should have 
orthogonal rows. Transform coding has since spread into almost all aspects of our lives, 
through its use in the popular media standards, such as MP3, JPEG and MPEG. 
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Further Reading 

An excellent introductory text to numerical analysis is by Atkinson [6]. For splines, the 
magazine review article by Unser [154], gives a thorough overview of splines and their 
use in signal processing, together with a number of references. The book by Strang and 
Fix, [142 j , contains further details on polynomial reproduction and Strang-Fix Theorem. 
For compression and transform coding, the review by Goyal, [61], is the basis of 
Section [5. 5] and contains further details and generalizations, as do [33, 58, [60] [64]. Details 
on high-resolution quantization theory for both scalars and vectors can be found in [57] 1 64] 
and references therein. 



Exercises with Solutions 

5.1. Chebyshev Polynomials 

(i) Using the trigonometric identity 

cos((Jfe±l)0) = cos(fc0)cos(6»)Tsin(fc0)sin(e), (E5.1-1) 

prove the recursion fl5.17| ) for Chebyshev polynomials, 
(ii) Using ( |5.17| ). prove that the Chebyshev polynomials are polynomial functions in t. 
(iii) Show that under the inner product (|5.16| ), the Chebyshev polynomials satisfy 

(" 0, for n 7^ m; 
(T n ,T m ) = { tt, for n = m = 0; 
(_ 7r/2, for n = m > 0. 

(Hint: The change of variables 8 = cos -1 1 simplifies the integrals.) 

(iv) Show that the leading coefficient in the Chebyshev polynomial Tj.(t), k £ Z+, is 
2 fc-l 

(v) Prove the expressions for the zeros, (5,181) , as well as extrema, ( |5.19| ), of Chebyshev 
polynomials. 

Solution: 

(i) With Tfc(t) = cos(fc arccost) as in ( [5.15) , and 8 = arccosi, 

T k+1 {t) = cos((fc + l)0) ( = cos(k8) cos{8) - sin(fc0) sin(0) 
= cos(fc6») cos(6») - sin(fce) sin(0) ± cos(kd) cos(8) 
= 2 cos(k8) cos(8) - (cos{k9) cos(e) + sm(kd) sin(0)) 

= 2cos(fc0)cos(e) -cos((fc- 1)6») = 2tT k (t)-T k _ 1 (t), 
where both (a) and (b) follow from (1E5.1-1) . 
(ii) We can prove this by inductions. First, 

T 2 {t) = 2tT 1 (t)-T {t) = 2t 2 - 1, 

a polynomial function. In the induction step, if T k (t) and T k _i(t) are polynomial 
functions in t, then 

k fc-1 

T k+1 (t) = 2tT k {t)-T k _ 1 (t) = 2tY J <*ni n -Y J Pnt n 

n=0 n=0 

k fc-1 fc+1 fc-1 

= J2 2a n t n+1 - J2 /W = Y, 2a n-it n - Y, Z 3 "*" 

n = n = n=l n = 

fc-1 fc+1 

= -/3o + ^(2a„-i-/3n)i n +2afc-ii fc + 2 Qfe t fe+1 = J^ 7n t n , 

n = 1 n = 
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clearly a polynomial function in t. 
(iii) Using (15.161) , we have 

(T n ,T rn ) = j T n (t)T m (t){l-t 2 )-^ 2 dt 

= I cos(narccosi) cos(marccost) (1 — t )~ ' dt. 

We first solve for n = m = 0: 

/l 
(l-t 2 ) _1/2 dt = arcsinil^x = (2fc + 1) (2fc + 3)- = -vr, 
.I 2 2 

which is the same as 7r. 

Next, we solve for n = m > 0: 

(T n ,T m ) = I cos(n arccos t) 2 (l - t 2 )~ 1/2 dt 

+ — sin(2n arccos i) 



1 
- — arccos t 
2 



i An 



~(2kn - (2k + l)n) = J. 



Finally, we solve for n ^ m: 



(T n ,T m ) = / cos(n arccos t) cos(m arccos t) (1 — t ) ' dt 

1 f 1 
— / (cos((n + m) arccos t) + cos((n — m) arccos t)) (1 — t )~ ' dt 

2 J_i 



2 . 

1 / sin((n + m) arccos t) 



sin((n — m) arccos t) 



n + m 

(iv) We can again use induction to prove this. For T\ (t) = t, the leading coefficient is 
2°. Assuming that the leading coefficient for T^(t) is 2 fc_1 , then using the result of 
|(ii)| we see that the leading coefficient of T^ + i(t) is 

fa +l = 2a k = 22 fe " 1 = 2 k . 

(v) From the expression for Tfc(t) = cos(fc arccos i), we get that the zeros are at 

, 7T / 2m + 1 

(2m, + 1)— = fcarccost, => t m = cos 5 

V '2 \ 2k 

for m = 0, 1, ..., k — 1. Similarly, the extrema are at 

/m 
m-7r = k arccos t, => t m = cos) — 7r 

V k 

for m. = 0, 1, . . . , k. 

5.2. Strang-Fix Theorem for an Interpolating (p(t) 

Assume an interpolating function </?(£) as in ( [5.70| ) that is sufficiently localized as in (|5.68| ). 
Prove that (i) and (ii) in Theorem 5.7 are equivalent. 

Solution: Condition (i) means that there exist coefficient sequences a„ such that 



^a4 fc V(*-rt) = (t-t ) k , k = 0, 1 ...,K. (E5.2-1) 



(k) 

Because of the interpolation property, when t = m, 

^2 otn <p(m - n) = a { m ' = (m - t ) k . 

neZ 
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Thus, ([E5.2-1) becomes 

^{n-t Q ) k tf(t-n) = (t-t ) k fc = 0, 1 ...,K, 

which, by setting to = t, yields 

0; 






2, ..., K 



(E5.2-2) 



The left-hand side is a 1-periodic function, and because of localization ( [5.68) , converges to 
an C 1 function on the [0, 1] interval. This function has a Fourier series representation for 
each k = 0, 1, . . . , K. The coefficients are 



„(*0 



f ^2(n-t) k v {t-n)e-^ u dt ( =' V f (n - t) k ip(t - n)e~^ u dt 

S rHMt)^**^ A^M (E5.2-3) 

where (a) follows from ( [1.196|) ; in (b) we merge individual integrals into a single integral 
over the real line; and (c) follows from the Fourier transform, ( [3.48a) , as well as the 
differentiation property in frequency in Table 3.21 For k = 0, the function in ( [E5.2-2) is 
equal to 1, so the Fourier series coefficients are c. — 6t and thus, 

<E-(27rf) = &t <ei 

For k = 1, 2, . . . , K, the function in ( [E5.2-2) is zero, or 

c {k) =0 fc = l, 2, ..., K 

leading to 

$ (fe) (2rf) =0 k = l,2,...,K, <eZ, 

hence verifying (15.69) . 
5.3. Sampling Discrete- Time Periodic Stream of Kronecker Delta Pulses 

Let x n be a discrete-time periodic signal of period TV, containing K weighted Kronecker 
delta pulses at locations {no, ni, •••, tik— l}, "£ 6 [0, N — 1] and A" < [N/2\, 

A'-l 
3-n — / J c £"n — n^> 

and let X k be its DFT from ([2.159a) . 

The sequence a: n is filtered with the time-reversed version of an ideal lowpass filter 
h n = (1/N) J2k=-K W N kn and downsampled by M , 

AT 

J// = (/l£A/-n ®X n ) n , £=0, 1, ..., — -1, 

where M is an integer divisor of N satisfying N/M > IK + 1. 

(i) Prove that the DFT coefficients X k , k £ [— K, K] are sufficient to determine the 

locations of K weighted Kronecker delta pulses, 
(ii) Prove that the N/M samples yi are a sufficient representation of X k , k S [— K, K], 
that is, find the relation between X k and Yj. . 

Solution: 

(i) The DFT coefficients are given by 

K-l 

X k = J2 x e W N nt < 1 = 0,1, ..., K-l. (E5.3-1) 

1=0 
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Since X^ is a linear combination of K complex exponentials, W^ k , the locations n^ 
of the Kronecker delta pulses can be found using the following method: Find the 
so-called annihilating filter 

K-l 



A(z) 
such that 



l+ai2 



- a-i z 



■ + a K z 



k = 



- 1 ), 



y^a fc X m _ fc =0, m = 0, 1, .. 

k = 
If we can find the unknown coefficients a^, k — 1, 2, 
Kronecker delta pulse locations m, 



, N - 1. 



(E5.3-2a) 



1, 2, 



K, we will find the unknown 
K, because W N k are the roots of 
A(z). Since ag = 1, K equations (|E5.3-2a)) are sufficient determine the K unknown 
filter coefficients aj,. Let m = 1, 2, ... , K", then the system in (]E5.3-2a) is equivalent 
to 

K 

y^a fc X m _ fc = -X m , rn = l, 2, .. . , K. 

k=i 
Writing this in matrix form, 



Xi Xq 



X. 
X. 



K+l 
AT+2 





"oi" 




"Xi] 




Q2 


= _ 


x 2 




ok. 




Xk_ 



(E5.3-2b) 



Xk Xk-i ■ ■ ■ Xq 
Because X^ are linear combinations of complex exponentials, the matrix in ( |E5.3-2b[ ) 
is of full rank K. Thus, there exists a unique solution {ai, a 2 , . . . , ajf}. The set of 
locations {no, fti, . . . , n K _i} can then be found as the zeros of A(z). 

The weights of the Kronecker delta pulses are obtained by solving the K DFT 
equations (JE5.3-1)) . leading to a Vandermonde system 



1 



l 



l 



w 



>o(^-i) vk-»i(*-i) 



IV 



N 



IV 



H/ ni[-l(ii-l) 





x "o 




Xo 1 




%ni 


= 


x l 




, Cn K-l. 




_Xk-i_ 



and has a unique solution since the locations are distinct (see (|1.231| )). 
(ii) For each I = 0, 1, . . . , N/M - 1, 

JV-1 

Vi = {heM- ri ®x„) n = ^ x n h n _ tM 



( => ^E> E ^-™<— ) 



A 



.. JV-1 K 

^5>» E ^ mn w. 



(6) _L 

Af 



n = 



A 



n:tM 
N 



1 K 

- ^ X- m W$„, (E5.3-3) 



^ E ^/m E *» ^ n " 

-i = -K 71 = " m=-K 

where (a) follows from taking the DFT of h n ; in (b) we exchanged the order of 
summations and used W^/ tM = WV}f M ; and in (c) we used the DFT of the second 
summation. 

We now find the DFT of y e . For each k = 0, 1, . . . , N/M - 1, 



N/M-l 



N/M-l K 



y k = E yt w N/M = - E E *-™^/M<%f 



»=0 



£ = m = -K 



K N/M-l 

n.=—K 1=0 



N 



M 
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where (a) follows from ( |E5.3-3[ ); in (b) we exchanged the order of summations and 
multiplied the two complex exponentials; (c) follows from the orthogonality of roots 
of unity, (|2.277c| ). making the second sum go to zero except for m = — k when it 
equals N/M, where k = 0, 1, ..., min{K, N/M - 1}. 
By assumption, N/M > 2K + 1 > K, and thus 

X k = MY k , k = 0, 1, ..., K. 

5.4. Quantization Intervals 

Let x be a real- valued random variable with PDF f x . This random variable is quantized 

into K representation values x^, k = 0, 1, , K — 1, with xj. < xj. +1 for all k. Show that 

the nearest neighbor assignment, 

x — > Xfc when |x — xj. | < |x — x$| for all i ^ k, 

minimizes the mean-squared error E[(x — x) 2 ]. This leads to a midpoint splitting rule, 
where values x £ [xj., x^+i] are assigned to x^ if smaller than (xj. + xj,_ ) _ 1 )/2 and to xj._|_! 
otherwise. (In multiple dimensions, this leads to Voronoi cells.) 

Solution: Denote the quantization mapping by q : R — > {xj c } fe J" so that x = <?(x). Since 
E[(x-x) 2 ] = ^ (t-q(t)ff x (t)dt 

J — oo 

and f x is nonnegative, the mean-squared error is minimized by minimizing (t — q(t)) 2 at 
each point t. For any ( El, out of the choices for q(t) in {xj.} fc ~ , the squared error 
(t — q(t)) 2 is minimized by choosing q(t) to be the representation value xj. closest to t. 
Specifically for t £ [xj., Xj._|_i], the distance from x^. is an increasing function and the 
distance from xt, + i is a decreasing function; these distances are equal at the midpoint, 
where q(t) changes from x^ to xj. +1 . 



Exercises 

5.1. Basic Properties of Legendre Polynomials 

(i) Let V = {vq, Vi, , } be the set of polynomials orthogonal on C 2 ([a, &]); each v k 

has degree at most k. Express V in terms of Legendre polynomials, 
(ii) Prove the following recurrence relation for Legendre polynomials: 

2k + 1 k 

L k+1 {t) = ———tL k (t)--——L lt _ 1 (t) fcez+. 
k + 1 k + 1 

5.2. Orthogonal Polynomials and Nesting of Polynomial Subspaces 

Let V = {vq, vi, . . . , } be a set of orthogonal polynomials; each v k has degree at most k. 
Let p be a polynomial of degree m. Prove that (p, v k ) = for every k > m. 

5.3. Roots of Orthogonal Polynomials 

Let V = {^o, Vi, . . ., } be the set of polynomials orthogonal on [a, 6]; each v k has degree 

at most k. Prove that for any k, v k (t) has exactly k real, distinct roots in the open interval 

(a,b). 

(Hint: [TBD, Atkinson p. 213-214].) 

5.4. Poor C°° Behavior of Lagrange Interpolation 

Let x (i) = ( 1 + 1 2 ) ~ 1 and let p^ (t) denote the Lagrange interpolation of K + 1 samples of 
x evenly-spaced over [—5,5]. 



(i) Bound \x(t) — Pn{t)\ over [—5,5] by evaluating (15. 7b) . 
(ii) Show that the C°° error bound from |(i)| grows without bound as K is increased. 
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(iii) In Matlab, plot pn for a few Ks. Observe empirically that \\x— Pk\\<x grows without 
bound as K is increased. 

5.5. Lagrange Interpolation with Coincident Nodes 
Given is the Lagrange interpolation formula ( |5,51) , 

(i) Write it for two nodes, one at and the other at e > 0. Using the definition of the 
derivative, prove that in the limit, when e — > 0, the Lagrange interpolation yields 
the first-order Taylor series expansion around 0. 

(ii) Generalize | (i) | to arbitrary order. 

5.6. Poor C°° Behavior of Taylor Series 

Let x(t) = (1 + £ 2 ) -1 and let px(t) denote the Taylor series approximation over [—5,5]. 
This function is infinitely differentiable but potentially difficult to approximate with a 
polynomial [6 . 

(i) Bound \x(t) — Pif(t)| over [—5,5], with Taylor series approximating around 0. 
(ii) Bound \x(t) — pi<(t)\ over [—5,5] by evaluating ( |5.10b| ) and compare it to|(i)l 
(iii) Show that the C°° error bounds from |(i)| and (ii) grow without bound as K is 
increased. 

5.7. Hermite Interpolation 

Let x( ! '(tj.) be the values of a real-valued function and its derivatives for k = 0, 1, . . . , L, 
and i = 0, 1, . . . , d^. Prove that an approximating polynomial of degree K = (5Zfe=n( a 'fe + 
1)) — 1 can be uniquely determined. Find an error bound for e(t) = x(t) — px(t). 

5.8. Proof of Weierstrass Approximation Theorem 

Let x(t) be continuous on [0,1] and let e > 0. For each K S Z + , define the Bernstein 
polynomial of degree K as 

PKi t) = ±Qx^)t* { i-t)«-*. 

Show that 

lim \\x — PkWoo = 0. 

Followed by an appropriate change of variables to the general interval [a,b], show this 
proves Theorem 5.3 [6] . 
(Hint: TBD.) 

5.9. Near Minimax Approximation 
Prove that 

K 

max IT It — tu I 

is minimized by choosing {ti c }?_ to be the K + 1 zeros of the Chebyshev polynomial 
Tfc(t). Show furthermore that the interpolation error bound ( |5.7b| becomes 

\x(t) - PinteruMI < F max U (K + 1) (£)| 

for approximation of x(t) on [—1, 1] with a polynomial of degree at most K. 

(Hint: This can be deduced from the minimax approximation of t K+1 by t K+l — 2~ K TK+i(t) 

with approximation error 2~ K .) 

5.10. Truncation as Orthogonal Projection 

Using the projection theorem, Theorem 11.261 show that truncation of the ideal filter ( [5.26[ ) 
is the least-squares approximation solution. 

5.11. Spline Spaces 

Given is the ./Vth order B-spline as in ( |5.32[ ) 
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(i) Show that f}^ N '(t) has N — 1 continuous derivatives, as well as an Nth derivative 

everywhere except at integers, 
(ii) Show that functions in Sn belong to C N_1 , as well as to C N except at integers. 

5.12. Dual Spline Bases 

Consider (]5.31b[ ) with N = 2, the quadratic spline /3^'(t). Denote by a n its deterministic 
autocorrelation evaluated at integers. 

(i) Show that A( 2 )(e^) > 0. 
(ii) Find the spectral root of A' 2 ' (z). 

(iii) Find the inverse C(z) = 1/A^ 2 '(z) and its stable, two-sided inverse z-transform c n 
such that 

(c, a< 2 >) n =<5„. 
(iv) Verify that 

/3 (2) w = E c ^ (2) (*- fc ) 

feez 
satisfies the biorthogonality relation 

(/3( 2 )(t),/3< 2 )(i-n)} t = S„. 

5.13. Battle- Lemarie Wavelets 

Consider (]5.31b) with N = 2, the quadratic spline /3 (2) (t). 

(i) Form the following 27r-periodic function: 

JV(e JW ) = Yl \B {2) (u + 2kn)\ 2 . (P5.13-1) 



(ii) Form the following function: 



»( U) = £^M. (P5.13-2) 



Prove that 

|$(w + 2fc7r)| 2 = 1. (P5.13-3) 

(iii) Find the inverse Fourier transform of <E>(oj) and prove that 

(tp(t), f(t-n)) t = 5 n . (P5.13-4) 

The resulting function, ip(t) is called a Battle- Lemarie scaling function^]] What we have 
above is a general procedure for orthogonalizing splines, leading to a function >p(t) that, 
with its integer shifts, form an orthonormal basis for S2 ■ 

5.14. Computing Inner Products with Splines 

Consider a function x(t) that is zero outside of [0,1/]. We want to compute 

y n = (*(t),/?W(t-n))t, 

where /3^ N >(t) is the causal JVth order B-spline. 
(i) Show that 

_ ( {x(t), 0(0) (t-n))t, 0<n<L-l; ( X n+1 - X n , < n < L - 1; 

\ 0, otherwise, \ 0, otherwise, 

where 

X„ = I x(t)dt 



i n = f x(t), 
Jo 



is the primitive of x(t) evaluated at integers, 
(ii) Generalize the above result to the inner product with an ./Vth-order spline. 



97 We defer the discussion on wavelets for Chapter 121 
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5.15. Polynomial Reproduction by B-Splines and Their Shifts 
For the JVth-order B-apline, /3 (JV) (i), prove ( |5.65| ). 
{Hint: You can use the fact that 

£0W(t-n) = 1.) 

(i) Take k derivatives of ( [5-65] ) using (15.551) to show that the result is a constant, upper 

bounding the degree of the polynomial, 
(ii) Take k integrals of (|5.65| ) using fl5,59[ ) to show that the result is a polynomial of 
degree k. 

5.16. Linear Approximation in a Biorthogonal Basis 

Extend ( |5,73| )-( [5.74c| ) to the case of truncated biorthogonal bases. 

5.17. Linear versus Nonlinear Approximation 

Consider two uncorrelated jointly Gaussian random variables xi ~ Af(0, 1) and X2 ~ 
A^(0, 1). Similarly to Figure [5.251 compare linear versus nonlinear approximation, and the 
resulting expected quadratic error. Without any loss of generality, you may use the stan- 
dard basis ipo = [l 0] , <pi = [0 l] . 

5.18. Quantizer Performance 

Show that (15.96| ) holds, with a different function g, for any family of quantizers that can 
be described by its operation on a normalized variable, not just optimal quantizers. 



a3.0 [October 2011] CC by-nc-nd Comments to book-errata@FouricrAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovacevic, and V. K. Goyal 



524 



Chapter 5. Approximation and Compression 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



Chapter 6 

Time-Frequency 
Localization 
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6.3 Localization for Sequences 536 
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Historical Remarks 546| 

Further Reading |546J 

Exercises with Solutions 546| 

Exercises 550| 

In Part II, we will construct various sets of vectors {fk} to use for signal 
analysis and synthesis. For a representation 

x = y^Qfc^fc 
fc 

to exist for any x, we need {</5fc} to be complete. For the representation to be unique, 
{fk} must be a basis, and we will often construct {fk} to also be an orthonormal 
set. However, these properties are not enough for the set {</?&} to be useful. For 
most applications, the utility of a representation is tied to time, frequency, scale, 
and resolution properties of the ifkS. Computational efficiency is also a concern; we 
begin to address it starting from Chapter \7\ 

Our primary goal in this chapter is to explore time, frequency, scale, and reso- 
lution properties of individual basis vectors, as these will be our tools for extracting 
information about a given signal (function or sequence). We call the process of 
extracting information probing, and we perform it by computing an inner product 
of the signal with a probe (p. The result (measurement) of probing is the coefficient 

525 
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(a) j j - j — 

3 3 



(b) 



TfBD 

(a) Musical score. (b) Time-domain functions. (c) Time- frequency plane for (b). 

Figure 6.1: Musical score as an illustration of a time-frequency plane [167] . 



for functions, 




/oo 
x(t)<p*(t)> 
~oo 



/ . 1 poo 1 

U / X (w)9*(w)du = —(x, $> U1 (6.1) 



where (a) follows from the generalized Parseval's equality ( ]3.69b|) . The two integral 
expressions for a are suggestive of probing x for its characteristics in time and in 
frequency. Because of the uncertainty principle, the probe will have limited simul- 
taneous localization in time and frequency, so what we learn about x will be limited 
as well. The principle holds across various ways to measure time and frequency 
localization with a single probing, and for both discrete and continuous time. Scal- 
ing in time or frequency leads to the trade-off between these localizations in the 
two domains, while shifting and modulation leave them unchanged. Overall, the 
uncertainty principle in its various guises helps us understand time and frequency 
representations and localization properties of individual vectors contributing to our 
intuition for what can be expected of a basis. 

6.1 Introduction 

For certain simple signals, time and frequency properties are quite intuitive. Think 
of a note on a musical score. It is of a certain frequency (for example, middle A 
has the frequency of 440 Hz), it has a start time, and its value (J, J, J" 1 ) indicates its 
relative duration. We can think of the musical score as a time-frequency plane with 
a logarithmic frequency axis and notes as rectangles in that plane with horizontal 
extent determined by start and end times, and vertical position related in some way 
to frequency, as in Figure 16.11 

Localization in Time and Frequency Time and frequency views of a signal are 
intertwined in several ways. For example, the Fourier transform gives a precise 



a3.0 [October 2011] CC by-nc-nd Comments to book-errata@FouricrAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



6.1. Introduction 527 

sense of interchangeability: if x{t) has the Fourier transform X(u>), then X(t) has 
the Fourier transform 2ttx{— lu). More relevantly to this chapter, various forms of 
the uncertainty principle determine the trade-off between fine localization in the 
two domains; signals finely localized in time will be coarsely localized in frequency; 
conversely, signals finely localized in frequency will be coarsely localized in time. 
The uncertainty principle also bounds the product of spreads in time and frequency, 
with the lower bound reached by Gaussian functions. 

Scale Another natural notion for signals is scale. For example, given a portrait of 
a person, recognizing that person should not depend on whether they occupy one- 
tenth or one-half of the image r 8 l thus, image recognition should be scale invariant. 
Signals that are scales of each other are often considered as equivalent. However, this 
scale invariance is a purely continuous-time property, since discrete-time sequences 
cannot be rescaled easily. For example, downsampling by a factor of N is in general a 
lossy operation, while upsampling by a factor of N introduces spurious zeros. What 
we will consider natural scaling operations in discrete domain are sampling (lowpass 
prefiltering followed by downsampling) and interpolation (upsampling followed by 
lowpass postfiltering) operations we introduced in Chapter 3J 

Resolution A final important notion we discuss is that of resolution. Intuitively, a 
blurred photograph does not have the resolution of a sharp one, even when the two 
prints are of the same physical size. Thus, resolution is related to the bandwidth 
of a signal, or, more generally, to the number of degrees of freedom per unit time 
(or space). Classical bandwidth is then proportional to resolution. Consider the 
space of bandlimited functions x(t) £ BL[— ajn./2,ajn./2] as in Definition 14.121 Then, 
the sampling theorem, Theorem 14.141 states that samples taken every T = 2tt/ujo 
sec, or x n = x(nT), n € Z, uniquely specify x(t). In other words, real functions of 
bandwidth uiq have loq/(2tt) real degrees of freedom per unit time. 

An example of a set of functions that, while not bandlimited, do have a finite 
number of degrees of freedom per unit time, are piecewise-constant functions from 
Q4.1]) . Clearly, x(t) has 1 degree of freedom per unit time, but an unbounded 
spectrum since it is discontinuous at every integer. This function is part of a general 
class of functions belonging to the so-called shift-invariant subspaces we studied in 
Chapter \4\ (see also Exercise 16.1) . 

Interactions Clearly, scale and resolution interact; this is most obvious with im- 
ages as illustrated on a drawing by C. Allan Gilbert entitled All is Vanity cz\ Fig- 
ure 16.21 It is designed to be perceived as either a woman sitting in front of a mirror 
(when seen from near by and at high resolution as in Figure [6727 a)), or as a skull 
(when seen at low resolution as in Figure [6721 (b)). Figure [6721 (c) illustrates the no- 
tion of a change of scale; even though the scale has been halved, resolution remains 



98 This is true within some bounds, linked to resolution as we discuss shortly. 
"Another beautiful optical illusion is Salvador Dali's Gala Contemplating the Mediterranean 
Sea which at Twenty Meters becomes a Portrait of Abraham; the title says it all. 
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(d) 



Figure 6.2: All is Vanity by C. Allan Gilbert illustrates notions of scale and resolution 
for an image, (a) The original, high-resolution, version, (b) A blurred, lower-resolution, 
version; resolution is lower and thus our perception of the image has changed, (c) A scaled 
version of half size in each dimension; resolution is unchanged and thus our perception 
of the image remains unchanged, at least as long as our visual acuity can pick up the 
necessary resolution, (d) A scaled version, where the visual acuity is not sufficient to see 
the full information in the image. 



unchanged and thus our perception of the image as long as our perception is suf- 
ficiently good. Figure 16.2( d) shows a poststamp version, where we cannot see the 
details anymore, and thus, only the skull is left. 

Filtering can affect resolution as well. If a function of bandwidth ujq is perfectly 
lowpass filtered to \u\ < j3u)q/2, with < (3 < 1, then its resolution changes from 
luq/2tt to Pu>o/2tt. The same holds for sequences, where an ideal lowpass filter with 
bandwidth /?7r, < f3 < 1, reduces the resolution to /3/2 samples per unit time. 

Chapter Outline 

The present chapter explores the above basic notions as well as their interactions 
in detail. Section [6.21 discusses localization concepts for functions, while Section [6. 31 
does the same for sequences with a brief mention of finite-length sequences. 

6.2 Localization for Functions 
6.2.1 Time Localization 

Consider a function x(t) G C 2 (M) where t is a time index. We now discuss local- 
ization of the function in time. When the function is finitely supported, its Fourier 
transform is not (it can only have isolated zeros); that is, a function cannot be per- 
fectly localized in both time and frequency. Even if not of finite support, a function 
might still decay rapidly as t — > ±oo. Such decay is necessary for working in £ 2 (K); 
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the function must decay faster than |£| -1 ' 2 for large t (see Section [3 . 4 . 2 [ ) . 

A concise way to describe locality (or lack thereof) , is to introduce a spreading 
measure akin to standard deviation, requiring normalization so that |a;(i)| 2 /||a;|| 2 
can be interpreted as a PDF (this normalization is precisely the same as restricting 
attention to unit-norm functions). Its mean is then the time center of the function 
and its standard deviation is the time spread. 

Definition 6.1 (Time center and spread for functions) Let x(t) be a 
function in C 2 (R) of norm ||a;|| 2 . 

Its time center fj, t and time spread A t are 

1 r°° 

1 / . , ,. M 2 



\xf 

1 

w 



W = NZiia / t\x(t)\*dt, (6.2a) 



A 2 = yrL / (t - lH)*\x(t)F dt. (6.2b) 



Example 6.1 (Time spreads for functions) Consider the following func- 
tions and their time spreads (see Figure 13.91 ) : 

(i) The sine function from (3.75) has fit = and infinite A 2 , as |a;(£)| 2 decays 

only as \t\~ 2 . 
(ii) The box function from (37T6T ) has \x t = and A 2 = tg/12. 
(iii) The Gaussian function from ( |3.78| ) with 7 = (2a /ir) 1 ' 4 is of unit C 2 norm; 

see (3.1 3c) . It has fj, t = and A 2 = l/(4a). 

From the above example, we see that the time spread can vary widely; the box func- 
tion has the narrowest one, while the sine function has an infinite spread, showing 
that it can be unbounded, even for widely-used functions. Exercise 16.11 explores 
functions based on the box function and convolutions thereof. 

The time center and spread satisfy the following (see Solved Exercise 16.2) : 

(i) With the shift of a function in time, y(t) = x{t — t$), 

Hy,t = Hx,t + t , (6.3a) 

A y ,t = A Xft , (6.3b) 

that is, the time center shifts and the time spread is invariant, 
(ii) With the norm-preserving scaling of a function in time, 

y(t) = \/ r ax(at) 1 (6.3c) 

the time center and spread satisfy 

Mj/,t = —fix,t, (6.3d) 

A V:t = — A XtU (6.3e) 

that is, both the time center and the time spread scale. 
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6.2.2 Frequency Localization 

The concept dual to time localization of x(t) is frequency localization of its Fourier 
transform X{ui). Similarly to time, when X(u) is finitely supported (bandlimited as 
in Definition 14. 12] ), the function in time is not. If not of finite support, the Fourier 
transform might still decay rapidly, leading to the notion of the frequency spread. 
As we did for the time spread, we normalize the frequency spread, so as to be able 
to interpret |X(w)| 2 /||X|| 2 as a PDF. 



Definition 6.2 (Frequency center and spread for functions) Let x(t) 
be a function in £ 2 (R.) with the Fourier transform X(u>) of norm ||X|| 2 = 27r||a;|| 2 . 
Its frequency center \x^ and frequency spread A u are 

1 f 00 

A*" = o 11 119 / w|X(w)| 2 dw, (6.4a) 



27r||xP 
1 

2dHI 2 



-OO 

00 



A l = TT^ / (v - l*u,) 2 \X(w)\ 2 du. (6.4b) 



Note that the frequency center will be for all real functions because of the 
symmetry of the Fourier transform. 

Example 6.2 (Frequency spreads for functions) We consider the same 
functions as in Example 16.1 and their frequency spreads (see Figure 13.9) : 

(i) The Fourier transform of the sine function from ( ]3.75| ) has (j, u = and 

A 2 = W 2 /12. 
(ii) The Fourier transform of the box function from (3.76) has /i w = and 

infinite A 2 , as |X(w)| 2 decays only as |w| -2 . 
(iii) The Fourier transform of the Gaussian function from (3.78) has /i w = and 
A 2 = a. 

The frequency center and spread satisfy the following (see Solved Exercise 16. 2) : 

(i) With the shift of a function in frequency, Y(u) = X (uj — luq) , 

Hy,u = Mx,w + Wo, (6.5a) 

&v, u = Ax lW , (6.5b) 

that is, the frequency center shifts and the frequency spread is invariant, 
(ii) With the scaling of a function in frequency, Y(u>) = (1/y/a) X(uj/a) \ 100 \ 

Vy,uj = a^x,tu, (6.5c) 

Ay lUI = aA^ U) (6.5d) 

that is, both the frequency center and the frequency spread scale. 



100 We choose this scaling to be consistent with the scaling in time, y(t) = yfax(at), since then 
its Fourier transform is scaled in frequency, Y(cj) = (1/y/a) X(cu/a), as in (]3.58b|) . 
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Figure 6.3: The time- frequency plane, (a) The function a;(t) with the Fourier transform 
X(uj) has an associated Heisenberg box centered at (im,Hu) of width 2A t and height 2A„. 
(b) The product |x(t)| 2 |X(cj)| 2 as a density plot. 




\x(t)\ 
(a) Shifting in time 



x(t)\ 

(b) Shifting in frequency. 



Figure 6.4: Shifting of a function with the Heisenberg box centered at (nt,fiu>) of width 
2A t and height 2A^. (a) Shifting in time by to, x(t — to), shifts the Heisenberg box to 
(ut + to, fJ.^); the size remains the same, (b) Shifting in frequency by uio (modulating), 
X(lu — Wo)> shifts the Heisenberg box to (/Xt,/i w +o;o); the size remains the same. 



6.2.3 Uncertainty Principle for Functions 

Heisenberg Box Given a function x(t) and its Fourier transform X(lo), we have 
just introduced the 4-tuple {jit, At, Hu>, A w ), describing the function's center in 
time and frequency (fit, fJ.u>) and its spread in time and frequency (A t ,A w ). It is 
convenient to show this pictorially (see Figure 16.31 ) , as it conveys the idea that there 
is a center of mass (pt, Mo>) around which a rectangular box of width 2A t and height 
2A U is located. The plane on which this is drawn is called the time-frequency plane, 
and the box is usually called a Heisenberg box, or a time-frequency tile. We adopt 
a convention that we show only the first quadrant of the time-frequency plane; we 
reserve the second quadrant for the magnitude response of the Fourier transform of 
the function, |X(w)|, and the fourth quadrant for the absolute value of the function 
itself, |a;(t)|. 
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Figure 6.5: Scaling of a function with the Heisenberg box centered at (fit,fiui) of width 
2At and height 2A U , shifts the Heisenberg box to (fit /ot, a flu,) and scales its width to 
2A t /a and its height to 2«A„. 



From our previous discussion on time and frequency shifting, we know that a 
function obtained by a shift and modulation of x(t), 



e jUot x[t - t ) 



FT 



-^X(u-uo), 



(6.6) 



will have a Heisenberg box of the same size, simply shifted by to and u>q (dashed 
boxes in Figure 16.41 we separated the effects of time and frequency shifts for clarity) . 
Similarly, a function obtained by scaling of x(t), 



ax(at) 



FT 



Ja a 



(6.7) 



will have a Heisenberg box of the same area, scaled appropriately in time and fre- 
quency. That is, if x(t) has a Heisenberg box specified by (fit, A t , fj, w , A u ), then the 
scaled function has a Heisenberg box specified by (/^t/a, At/a, afi u ,a A u ) (dashed 
box in Figure 16.5) . The effects of shift, modulation and scaling on the Heisenberg 
boxes are summarized in Table 16.11 

Function Time center Time spread Fourier transf. Freq. center Freq. spread 



x(t) 


in 


A, 


X(w) 


Mw 


A u 


x(t - t ) 


fit + to 


A, 


<r 3uta x(lu) 


Mw 


A u 


eP Uot x(t) 


IH 


A t 


X(lo - uo) 


flu + u 


A u 


y/ax(at) 


fit /a 


At/a 


X(u/a)/y/a~ 


a fiu, 


aA„ 



Table 6.1: Effect of a shift in time and frequency, as well as scaling in time and frequency, 
on a Heisenberg box (fit, At,fiu>, A„). 



Uncertainty Principle So far, we have considered the effect of shifting in time and 
frequency as well as scaling on a Heisenberg box. What about the effect on the area 
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of the Heisenberg box? The intuition, corroborated by what we saw with scaling, is 
that one can trade time spread for frequency spread. Moreover, from Examples 16.1 
and 16.21 we know that a function that is narrow in one domain will be broad in the 
other. It is thus intuitive that the size of the Heisenberg box is lower bounded, so 
that no function can be arbitrarily narrow in both time and frequency. 

Theorem 6.3 (Uncertainty principle) Let x e C 2 (R), with the Fourier 
transform X(ut), the time spread A t and frequency spread A w . Then, 

A?A^ > 1, (6.8) 



with the lower bound attained by Gaussian functions ( 13.780 . 

Proof. We prove the theorem for real functions; see Exercise 16.21 for the complex case. 
Without loss of generality, assume that x(t) is centered at t — and is of unit norm, 
||a;|| = 1; otherwise, we may shift and scale it appropriately. Since x(t) is real, X(ul) 
is centered at u — 0, so fi t — fj, u — 0. 

Suppose x(t) has a bounded derivative x'(t); if not, A w = 00 so the statement 
holds trivially. Consider the function tx(i)x'(t) and its integral. Using the Cauchy- 
Schwarz inequality (1,31) , we can write 

tx(t)x'(t)dt < / \tx(t)\ 2 dt / \x'(t)\ 2 dt 



"" ' \tx(t)\ 2 dt — j \jloX{lo)\ 2 dt = A^A^, ((!.<)) 



where (a) follows from Parseval's equality ( |3.69aj) and the differentiation in frequency 
property of the Fourier transform ( ]3.61a[) . We now simplify the left side: 

tx(t)x(t)dt =2/ dt = 2 tx V>\-°°~2 X ^' = ~2' 



where (a) follows from (x 2 (t))' = 2x' (t)x(t); (b) from integration by parts; and (c) 
holds because x(t) € £ 2 (R) implies that it decays faster than l/y|t| for t — > ±00, and 
thus lim t ^ ±00 tx 2 (t) - (see Section [3X2| . Substituting this into ( [OD yields ( 16T81) . 

To find functions that meet the bound with equality, recall that Cauchy-Schwarz 
inequality becomes an equality if and only if the two functions are collinear (scalar 
multiples of each other), or at least one of them is 0. In our case this means x'(t) — 
j3tx(t). Functions satisfying this relation have the form x(t) = 7^' ' 2 = je~ at , the 
Gaussian functions. 

The uncertainty principle points to a fundamental limitation in time- frequency 
analysis using linear analysis using inner products ! 101 ! If we desire to analyze a 



101 There exist nonlinear techniques that are not bound by this limiting factor; we discuss these 
in Chapter 131 
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TffiD TffiD 

(a) (b) 



Figure 6.6: Chirp signal and time-frequency analysis with a linearly increasing fre- 
quency, (a) An example of a windowed chirp with a linearly increasing frequency. Real 
and imaginary parts are shown as well as the raised cosine window, (b) Idealized time- 
frequency analysis. 

TfBD TffiD 

(a) (b) 



Figure 6.7: Analysis of a chirp signal, (a) One of the analyzing functions, consisting 
of a (complex) modulated window, (b) Magnitude squared of the inner products between 
the chirp and various shifts and modulates of the analyzing functions, showing a blurred 
version of the chirp. 



function with a probing function to extract information about the function around 
a location (/j, t , /i u ), the probing function will necessarily be imprecise. That is, if we 
wish very precise frequency information about the function, its time location will 
be uncertain, and vice versa. 

Example 6.3 (Chirp function) As an example of time-frequency analysis and 
the trade-off between time and frequency sharpness, consider a windowed chirp 
function (complex exponential with a rising frequency). Instead of a fixed fre- 
quency ujq, the local frequency linearly grows with time, u>ot, 

x(t) = w{t)e 3Uot \ 

where w{t) is an appropriate window function ] 102 ! Figure [ETBT a) shows an example 
of a chirp function with an idealized time- frequency analysis in Figure [6T6T b). 

As analyzing functions, we choose windowed complex exponentials (but 
with a fixed frequency) . The choice we have is the size of the window (assum- 
ing the analyzing function will cover all shifts and modulations of interest). A 
long window allows for sharp frequency analysis, however, no frequency is re- 
ally present other than at one instant. A short window will do justice to the 
transient nature of frequency in the chirp, but will only give a very approximate 
frequency analysis due to the uncertainty principle. A compromise between time 
and frequency sharpness must be sought, and one such possible analysis is shown 
in Figure 16.71 

Other Localization Measures While the uncertainty principle uses a spreading 
measure akin to standard deviation, other measures can be defined. Though they 
typically lack fundamental bounds of the kind given by the uncertainty principle 



"Bats use such chirp functions to hunt for bugs. 
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( 16.8) , they can be quite useful as well as intuitive. One such measure, easily appli- 
cable to functions that are symmetric in both time and frequency, finds the centered 
intervals containing a given percentage (3 of the energy in time and frequency (where 
(3 is typically 0.90 or 0.95). 

For a unit-norm function x(t) symmetric around fj,t, with the 27r-norm Fourier 

transform X(u>) symmetric around /j, u , the time spread A t and frequency spread 
Aif are now defined such that 

(6.10a) 

(6.10b) 

Exercise 16. 3l shows that A t and A£f satisfy the same shift, modulation and scaling 
behavior as do A t and A w in Table 16.11 

6.2.4 Scale Localization 

At the very beginning of the chapter, we discussed the idea of signal analysis as 
computing an inner product with a probing function. Along with the Heisenberg 
box 4-tuple, another key property of a probing function is its scale. Scale is closely 
related to the time spread, but it is inherently a relative (rather than absolute) 
quantity. Before further describing scale, let us revisit scaling; we will point out the 
fundamental difference between continuous- and discrete-time scaling operations in 
the next section. 

An energy-conserving rescaling of x(t) by a factor a € R + was given in ( 16.7) . 
Clearly, this is a reversible process, since rescaling y(t) by (1/a) gives 

"7=1/- = **)• 

We can gain an intuitive understanding of scale by examining maps. The 
usual notion of scale in maps is the following: in a map at scale 1:100,000, an object 
of length 1 km is represented by a length of (10 3 m)/10 5 = 1 cm. That is, the 
scale factor a = 10 5 is used as a contraction factor, to map a reality x(t) into a 
scaled version y(t) = y/ax(at) (with the energy normalization factor y/a of no real 
significance because reality and a map are not of the same dimension). However, 
reality does provide us with something important: a baseline scale against which to 
compare the map. 

When we look at functions in £ 2 (K), a baseline scale does not necessarily 
exist. When y(t) = y/ax(at), we say that y is at a larger scale if a > 1, and at 
a smaller scale if a G (0, 1). There is no absolute scale for y unless we arbitrarily 
define a scale for x. 

Now consider the use of a real probing function tp(t) to extract some informa- 
tion about x(t). If we compute the inner product between the probing function and 
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TfflD TfflD TfflD 

(a) (b) (c) 



Figure 6.8: Aerial photographs of the EPF Lausanne campus at various scales, (a) 
5,000. (b) 10,000. (c) 20,000. 

TfflD TfflD 

(a) (b) 



Figure 6.9: Signals with features at different scales require probing functions adapted 
to the scales of those features, (a) A wide-area feature requires a wide probing function, 
(b) A narrow-area feature requires a sharp probing function. 



a scaled function, we get 

f°° 1 f°° T 

(y/ax(at), tp(t)) = \fa I x(at) ip(t) dt = —= / x(t) ip(— ) dr 
J-00 V« J -00 " 

= (x(t), -£=¥>(i/a)>. (6.11) 

Probing a contracted function is equivalent to stretching the probe, thus emphasiz- 
ing that scale is relative. If only stretched and contracted versions of a single probe 
are available, large-scale features in x(t) are seen using stretched probing functions, 
while small-scale features (fine details) in x(t) are seen using contracted probing 
functions. 

In summary, large scales a» 1 correspond to contracted versions of reality, 
or to widely-spread probing functions. This duality is inherent in the inner product 
( 16.11) . Figure 16.8 shows an aerial photograph with different scale factors as per 
our convention, while Figure 16.9 shows the interaction of signals with different-size 
features and probing functions. 

6.3 Localization for Sequences 

Thus far, we have restricted our attention to the study of localization properties 
and bounds for functions. Analogous results for sequences are not as elegant nor 
parallel, except in the case of strictly lowpass sequences. Thus, the uncertainty 
principle we present here holds only for those sequences in £ 2 (Z) whose DTFT is 
strictly lowpass in nature, that is, when X(e 3 ™) = 0. 

6.3.1 Time Localization 

Consider a sequence x n € £ 2 (Z) where n is a discrete-time index. We now discuss 
localization of the sequence in time. Similarly to functions, when the sequence 
is finitely supported, its DTFT is not (it can only have isolated zeros); that is, a 
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sequence cannot be perfectly localized in both time and frequency (see Exercise 16.4) . 
Even if not of finite support, a sequence might still decay rapidly as n — > ±00. 
Similarly to functions, we describe locality concisely by introducing the time center 
and the time spread. 

Definition 6.4 (Time center and spread for sequences) Let x n be a se- 
quence in £ 2 (Z) of norm ||a;|| 2 . 

Its time center \x n and time spread A n are 

Mn = tt~ ||2 ^2 n \xn\ 2 , (6.12a) 



Example 6.4 (Time spreads for sequences) Consider the following sequences 
and their time spreads: 

(i) The sine sequence from Table 13.6 has \x n = and infinite A 2 , as \x n \ 2 

decays only as |?i|~~ 2 . 
(ii) The box sequence from Table [3T61 has \x n = and A 2 = (n 2 , — 1)/12. 

As for functions, the example shows how time spreads can vary widely. 
The time center and spread satisfy the following (see Exercise 16.5) : 

(i) With the shift of a sequence in time, y n = x n - no , 

fiy.n = Hx,n+n , (6.13a) 

A y>n = A x>n , (6.13b) 

that is, the time center shifts and the time spread is invariant, 
(ii) With the upsampling of a sequence in time followed by lowpass postfiltering, 
y n = (1/VN) smc(irn/N) *„ x n / N , 

Vy.n = N H x , n , (6.13c) 

A„, n = NA x>n , (6.13d) 

that is, both the time center and the time spread scale. 

With the downsampling of a bandlimited sequence, x G BL[— w/N, n/N], 

1 

(6.13e) 

(6.13f) 

that is, both the time center and the time spread scale. 

If the sequence is not bandlimited, downsampling by N will keep only its 
Oth polyphase component. Since its norm is sequence dependent, we cannot 
tell how the downsampled sequence will be scaled. 
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6.3.2 Frequency Localization 



Definition 6.5 (Frequency center and spread for sequences) Let x n 
be a sequence in £ 2 (Z) with the DTFT X(e ju ) of norm ||X|| 2 = 27r||x|| 2 . 
Its frequency center /j, u and frequency spread A w are 

*- = idwL uWu){adu ' ( ° 4a) 

Al = 2^Lpf {uj ~^ )2]x{en]2<hj - (6 - i4b) 



Note that the frequency center will be for all real sequences because of the sym- 
metry of the DTFT. 

Example 6.5 (Frequency spreads for sequences) We consider the same 
sequences as in Example 16.41 and their frequency spreads: 

(i) The DTFT of the sine sequence from Table [3761 has fi^ = and A 2 = luq/12. 
(ii) The DTFT of the box sequence from Table 13.6 has fj, u = 0; its A 2 is no 
longer infinite, as it is the Dirichlet kernel (see Figure [4.361 ) . 

The frequency center and spread satisfy the following (see Exercise 16.5) : 
(i) With the shift of the function in frequency, Y(e ju ) = X(e j ^' u °^), 

Mj/.o) = Hx,u +^0, (6.15a) 

&v, u = A XjW) (6.15b) 

that is, the frequency center shifts and the frequency spread is invariant, 
(ii) With the upsampling of the sequence in time followed by lowpass postfiltering, 
y(e^) = \/NX(e jNw ) for |w| < n/N, 

1 

(iy,u = T^ Mx.wj (6.15c) 

A y . w = ^A x ^, (6.15d) 

that is, both the frequency center and the frequency spread scale. 

With the downsampling of a bandlimited sequence, x € BL[— tt/N, tt/N], 

Vy,u> = N jUjb.u,, (6.15e) 

Ay, u = NA X!U , (6.15f) 

that is, both the frequency center and the frequency spread scale. 
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Figure 6.10: The time-frequency plane. The sequence x n with the DTFT X(e 3 ") has 
an associated Heisenberg box centered at (/i n , /i„ ) of width 2 A n and height 2 A^ . 
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(a) Shifting in time. 




|X(e^)| 



| 1 



u-'n 



<y„ ,> 



(b) Shifting in frequency. 



Figure 6.11: Shifting of a sequence with the Heisenberg box centered at (fin, Hu>) of 
width 2A n and height 2A U . (a) Shifting in time by no, aj n -n , shifts the Heisenberg box 
to (/i n + Uo, fiu)', the size remains the same, (b) Shifting in frequency by u>o (modulating), 
X(e j ^~""'), shifts the Heisenberg box to ((j„,/j u +wo); the size remains the same. 



6.3.3 Uncertainty Principle for Sequences 

Heisenberg Box Similarly to functions, given a sequence x n and its DTFT X(e JUJ ), 
we have just introduced the 4-tuple (fi n , A„, /i w , A w ), describing the sequence's 
center in time and frequency (/i„, /i w ) and its spread in time and frequency (A„, A w ). 
As before, we show this pictorially (see Figure [6.10D , with the center of mass (//„, fi u ) 
around which a rectangular box of width 2A n and height 2A W is located, again 
producing a Heisenberg box, but this time for sequences. As before, we show only 
the first quadrant of the time-frequency plane; we reserve the second quadrant for 
the magnitude response of the DTFT of the function, |J 5 C(e' u ')|, and the fourth 
quadrant for the absolute value of the function itself, |a;(t)|. 

From our previous discussion on time and frequency shifting, we know that a 
sequence obtained by a shift and modulation of x n , 



e ju ° n av 



DTFT 



e -jutn w e j(w-w ( >h 



(6.16) 
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2iVA n 

l~ i~ -i_2A u /N 



T *^rn i i r 



Figure 6.12: Upsampling of a sequence with the Heisenberg box centered at (/i n , /j, u ) of 
width 2 A n and height 2 A u , shifts the Heisenberg box to ( N \n, n , fj,^ /N) and scales its width 
to 2N A n and its height to 2A^/N. The repeated spectra in frequency appear because of 
upsampling. 



will have a Heisenberg box of the same size, simply shifted by uq and u>o (dashed 
boxes in Figure 16.111 we separated the effects of time and frequency shifts for 
clarity). 

Similarly, a sequence obtained by upsampling of x n , 



x(n/N) 



DTFT 



X(e jNu ), 



(6.17) 



will have a Heisenberg box of the same area, scaled appropriately in time and fre- 
quency. That is, if x n has a Heisenberg box specified by (/(„, A„, /i w , A w ), then the 
upsampled sequence has a Heisenberg box specified by (N /i n , N A„, fj, u /N, A^/N) 
(dashed box in Figure 16.12) . The effects of shift, modulation and scaling on the 
Heisenberg boxes are summarized in Table 16.21 



Sequence 



Time 
center 



Time 
spread 



DTFT 



Frequency Frequency 
center spread 



#n 


fJ-n 


A„ 


X(e>") 


Hu, 


A u 


%n — riQ 


M" + n 


A„ 


e -]un X {e?") 


Hu, 


A w 


e^° n x n 


/J-n 


A„ 


X ( e J(^-"o)) 


Hu> + <^0 


A u 


bandlimited 


fJ.n/N 


A„/iV 


— V A(e J n ') 
N i bn 


N ^ 


N A^ 


upsampled 


N fin 


N A n 


X{e jN ") 


Vw/N 


A U /JV 


& postfiltered 













Table 6.2: Effect of a shift in time and frequency, as well as upsampling followed by 
posthltering and downsampling of a bandlimited sequence in time and frequency, on a 
Heisenberg box (/u n , A„, (jl u , A u ) . 
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Figure 6.13: The Heisenberg box for the Kronecker delta sequence. 

Example 6.6 (Heisenberg box for the Kronecker delta sequence) The 
Kronecker delta sequence from ( J2.7J ) is interesting to discuss as it seems to pos- 
sess perfect time localization; in fact, both /x„ = and A„ = 0. This means that 
the Heisenberg box is not bounded (no width) ; this does not violate Theorem 16.6 
as it requires X(e J7: ) = 0, clearly not satisfied by the the DTFT of the Kro- 
necker delta sequence, since it is constant everywhere, X(e JUJ ) = 1. Thus, this 
sequence is not a strictly lowpass sequence as we mentioned at the beginning of 
this section, a fact that has consequences in how we visualize its time-frequency 
localization properties using Heisenberg boxes. Weq can still draw its Heisenberg 
box, bearing in mind that it will be a line. For x n = <5„_3, the Heisenberg 4-tuple 
is (3, 0, 0, 7r 2 /3); Figure [6.131 shows the corresponding Heisenberg box. 

Uncertainty Principle With these definitions paralleling those for continuous-time 
functions, we can obtain a result very similar to Theorem 16.31 One could imagine 
that it follows from combining Theorem 16.3 with Nyquist-rate sampling of a ban- 
dlimited function. The proof is shown in Solved Exercise 16.31 and suggests the use 
of the Cauchy-Schwarz inequality similarly to our earlier proof. 




A standard example illustrating the tension between time and frequency lo- 
calization is the analysis of a sequence containing a Kronecker delta sequence in 
time and a Dirac delta function in frequency (complex sinusoid/exponential): 



S n - 



Juot 



DTFT 



X(e 



]U\ 



-jum 



+ 2ir8(u>-u> ). (6.19) 



Clearly, to locate the Kronecker delta sequence in time or the Dirac delta function 
in frequency, one needs to be as sharp as possible in that particular domain, thus 
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5 10 15 



j^ 



5 10 15 



(b) 



(c) 



Figure 6.14: Stretching of a sequence by a factor 2. Envelopes in each case are drawn, 
to emphasize the fact that stretching is in fact upsampling followed by postfiltering as 
in Figure 14.13( b) with g an ideal halfband filter, (a) Original sequence, (b) Result after 
upsampling by 2. (c) Result after filtering with an ideal halfband filter. 




(b) 



10 

(c) 



Figure 6.15: Contraction of a sequence by a factor 2. Envelopes in each case are drawn, 
to emphasize the fact that contraction is in fact prefiltering followed by downsampling as 
in Figure 14.13( a) with g an ideal halfband filter, (a) Original sequence, (b) Result after 
filtering with an ideal halfband filter, (c) Result after downsampling by 2. 



compromising the sharpness in the other domain, 
sequences and their DFTs in Example 16.71 



We illustrate this for periodic 



6.3.4 Scale Localization 

Unlike for functions, the notion of scaling is not as natural. For example, downsam- 
pling by a factor of N is in general a lossy operation, while upsampling by a factor 
of N introduces spurious zeros. What we will consider natural scaling operations 
in discrete domain are sampling (lowpass prefiltering followed by downsampling) 
and interpolation (upsampling followed by lowpass postfiltering) operations we in- 
troduced in Chapter [4} These are illustrated in Figures 16.141 and 16.151 

Thus, scale changes are more complicated in discrete time. In particular, com- 
pressing the time axis cannot be undone since samples are lost in the process. Scale 
changes by rational factors are possible through combinations of integer upsampling 
and downsampling, but cannot be undone in general (see Exercise 16.61 ). 
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Figure 6.16: Time-frequency resolution trade-off for a finite-length sequence consisting 
a Kronecker delta sequence in time and frequency, (a) Magnitude of the original sequence 
of length 256, x n . (b) Magnitude response of its length-256 DFT, Xk- (c) Magnitude 
responses of the length-16 DFTs of the original sequence split into 16 pieces of length 16 
each, (d) Magnitudes of the length-16 inverse DFTs of the original DFT split into 16 
pieces of length 16 each. 



6.3.5 Uncertainty Principle for Finite-Length Sequences 

In addition to the uncertainty principle for infinite sequences, there exists a simple 
and powerful uncertainty principle for finite-length sequences and their DFTs. 



Theorem 6.7 (Uncertainty principle for finite-length sequences) 
Let x n £ C N with the DFT Xk- Let N n and N k denote the number of nonzero 
components of x and X , respectively. Then, 



N n N k > N. 



(6.20) 



It turns out that X cannot have N n consecutive zeros (modiV). This result is 
explored in Exercise 16.41 and arises again in Chapter [131 
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Example 6.7 (A Kronecker delta sequence in time and frequency) We 
illustrate the fundamental trade-off between time and frequency localization with 
a numerical example. Consider a finite-length sequence of length TV = 256, 
containing a Kronecker delta sequence in time and frequency, as shown in Fig- 
ure 16.16( a). In Figure 16.16( b), we show the magnitude of its length-256 DFT, 
which has 256 frequency bins, and perfectly identifies the exponential compo- 
nent, while missing the time-domain Kronecker delta completely. To increase 
the time resolution, we divide the sequence into 16 pieces of length 16 each, tak- 
ing the length-16 DFT of each, as shown in Figure [6.16( c). Now we can identify 
approximately where the time-domain Kronecker delta impulse occurs; however, 
the frequency resolution is reduced, since we now have only 16 frequency bins. 
Finally, Figure 16.16( d) shows the dual case, that is, we plot the magnitudes of 
length-16 inverse DFTs of the original DFT split into 16 pieces of length 16 each. 
Now we can identify approximately where the frequency-domain Kronecker delta 
impulse occurs; however, the time resolution is reduced, since we now have only 
16 time bins. 
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Chapter at a Glance 

Uncertainty Principle for Functions 

For a function x(t) £ C (K) with the Fourier transform X{lS) 

rod 

energy in time \\ x \\ 2 / \x(t)\ 2 dt 

J — 00 

1 r°° 

Ht 7T-^ / ^^ 

1* z / 

l^ll J — OO 

time spread At 



time center nt - — — I t\x(t)\ dt 

1/2 



fiTiia f°° (t-Ht) 2 \x(t)\ 2 dt\ 

\ \\X\\ J —00 / 



roo 

energy in frequency 2n ||x|| / |X(cu)| dto 

J — 00 

1 C°° 

frequency center fj,^ / u;|X(u;)| du) 

27T \\x \\ 2 7_oo 



frequency spread A u 



(t^ n (v - IK,? WW dj) 



1/2 



then A 2 A 2 > 1/4, with equality achieved by a Gaussian x(t). 



Uncertainty Principle for Sequences 



For a sequence x„ e £ 2 (Z) with the DTFT X(e juJ ) 
energy in time ||x|| 2_. \ x n\ 

time center /i n - — — \J n \x n \ 



1/2 
2 



time spread A„ „, V(«- ^n) 2 \x n \ 

V 1 " nez 

|X(w)| cfcj 

-7T 

1 /'^ 

frequency center fx u / u) \X{ui)\ 2 dui 

27r||a;|| 2 J_ 1I 

frequency spread A^ I — - — — / (u> — fi u ) \X(u>)\^ dui 



f 1 /■» \ V2 

V27r||a;|| 2 /_, 



then A 2 n A 2 > 1/4. 



Uncertainty Principle for Finite-Length Sequences 



For a sequence x n £ C with the DFT X^ 

number of nonzero components of x N n 
number of nonzero components of X N^ 



then N n JV W > N. 
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Historical Remarks 

Uncertainty principles stemming from the Cauchy-Schwarz in- 
equality have a long and rich history. The best known one is 
Heisenberg's uncertainty principle in quantum physics, first de- 
veloped in a 1927 essay [70]. Werner Karl Heisenberg (1901- 
1976) was a German physicist, credited as a founder of quantum 
mechanics, for which he was awarded the Nobel Prize in 1932. He 
had seven children, one of whom, Martin Heisenberg, was a cel- 
ebrated geneticist. He collaborated with Bohr, Pauli and Dirac, 
among others. While he was initially attacked by the Nazi war 
machine for promoting Einstein's views, he did head the Nazi 
nuclear project during the war. His role in the project has been a subject of controversy 
every since, with differing views on whether he was deliberately stalling Hitler's efforts or 
not. 




Kennard is credited with the first mathematically exact formula- 
tion of the uncertainty principle, and Robertson and Schrodinger 
provided generalizations. The uncertainty principle presented in 
Theorem [673] was proven by Weyl and Pauli and introduced to sig- 
nal processing by Dennis Gabor (1900-1979) [55], a Hungar- 
ian physicist, and another winner of the Nobel Prize for physics 
(he is also known as inventor of holography). By finding a lower 
bound to AtA u , Gabor was intending to define an information 
measure or capacity for signals. Shannon's communication the- 
ory [130] proved much more fruitful for this purpose, but Gabor's 
proposal of signal analysis by shifted and modulated Gaussian 
functions has been a cornerstone of time-frequency analysis ever 
since. Slepian's survey [133] is enlightening on these topics. 



^"^ 


A 


Jf \ 


^^H - 


A^^ 


I V 



Further Reading 

Many of the uncertainty principles for discrete-time signals are considerably more com- 
plicated than Theorem 6.61 We have given only a result that follows papers by Ishii and 
Furukawa [79] and Calvez and Vilbe [22]. 

Donoho and Stark [46] derived new uncertainty principles in various domains. Partic- 
ularly influential was an uncertainty principle for finite-dimensional signals and a demon- 
stration of its significance for signal recovery (see Exercises 16. 4| and |6. 7) . Moreover, Donoho 
and Huo [45] introduced performance guarantees for £ minimization-based signal recovery 
algorithms; this has sparked a large body of work. 



Exercises with Solutions 

6.1. Shift- Invariant Subspaces and Degrees of Freedom 

Show that a function in the shift-invariant space S of piecewise-constant functions (4.1) ) 
but over intervals of length T > has exactly 1/X degrees of freedom per unit time. 

Solution: Consider an interval (—KT,KT) for some positive integer K. A function in S 
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is specified on this interval by the IK inner products {(x, <Pk)}k=_K- Thus, the function 
has 2K/(2KT) = 1/T degrees of freedom per unit time. The result follows from using a 
similar argument for an interval with length not necessarily an integer multiple of T. 
6.2. Properties of Time and Frequency Spreads for Functions 

Consider the time spread At and the frequency spread A w defined in ( [6.2b) and ([6. 4b) . 
respectively. 

(i) Show that the time shift and frequency shift (complex modulation) of x(t) as in 

(16. 6} leave At and A u unchanged, 
(ii) Show that the energy-conserving scaling of x(t) as in (16.7) scales At by l/a, while 
scaling A^ by a, thus leaving the time-frequency product unchanged. 

Solution: Without loss of generality assume ||a;|| 2 = 1. 

(i) The time shift y(t) = x(t — to), or, Y(uj) = e-'"' X(w) in frequency domain, changes 
fi x ,t as follows: 

/oo roc 

t\y(t)\ 2 dt = t\x(t-t )\ 2 dt 

-oo J — oo 

= r (r + t )\x{r)\ 2 dr 

J — oc 

/OO /*OC /. -j 

r|x(r)| 2 dr + U, \x{r)\ 2 dr = (i Xlt + t , 

- oo •* — oc 

where (a) follows from r = t — to; and (b) from ||a:|| 2 = 1. 
The time spread A 2 , however, remains unchanged: 

A 2 ,t = f°° {t-ny,t) 2 \y(t)\ 2 dt 

J — CO 
roc 

= / (t-ti x ,t-t ) 2 \x(t-t )\ 2 dt 

J — CO 

= r (r-^, t ) 2 \x(r)\ 2 dr = A 2 , t , 
J —oo 

where (a) again follows from r = t — to- Since |Y(w)| = |X(w)| , the frequency 
spread is clearly not changed by time shift. 

Similarly, the frequency shift Y(uj) = X(lj — ujo), or, y(t) = e~ J u o* x(t), changes 
fi x ,ui as follows: 

Hy : u = — / w|Y(w)| dw = — / u}\X(ui — u)o)\ duj 
2tt J^eR 27T J u:€ g L 

( => -*- f (w + uj )\X(w)\ 2 dw 
2t J w€ r 

= — / w\X(w)\ 2 dw + wo— / ^(w)! 2 dw = /i x ,w + ^0, 

where (a) follows from w = w — u>q; and (b) from ||X (w)|| = 27r by Parseval's equality. 
The frequency spread A 2 u , however, remains unchanged: 



y '" 2 



rf ("-Vy^) 2 \Y(uj)\ 2 dcu 
= — / (a> - /i X)W - wo) 2 |X(w - w )| 2 dw 

27T J u£ B 

= ^/ (^-Mx,o,) 2 |XH| 2 ^ = A%, 

2tt J„=r 



where (a) again follows from w = cu — wo . Similarly to the time shift not changing 
the frequency spread, the frequency shift does not change the time spread. 
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(ii) Scaling y(t) = yfax(at), or, Y(cu) = (1/y/a) X(ui/ct) in frequency domain, changes 
fi>x,t as follows: 



/OO /"OO 

t\y{t)\ 2 dt = t\V^x(at)\ 2 

-OO J — OO 



dt 



ot J-oo 



x{t)\ 2 dr = — y, X:t , 



where (a) follows from r = at. 

The time spread changes as well: 



<* = / 

J —i 



(t-Vy,t) 2 \y(t)\ 2 dt 



= I (t-- l j, x ,t) 2 \Va~x(at)\ 2 
J-oo a 

(a) j_ r 

a 2 7_, 



dt 



(r~^ x , t ) 2 \x(r)\ 2 dr 



— A 2 ,. 



where (a) again follows from r = at. 

Scaling V(cj) = (X/^/a) X(uj/d), or, «/(t) = V'a^'rf) in time domain, changes 
fi x ,ui as follows: 



Z^l/.u 



= _L f w\Y{w)\ 2 dw = 1- f 
= — / w\X(w)\ dm = a/J, x ,u 

2""" Jw€R 



Ja a 



dui 



where (a) follows from w = ui/a. 

The frequency spread changes as well: 



i- f (w-fi v , w ) a |Y(w)| a dw 

27T Juj^R V a ft 

2 /. 

— / (w-n„, u ) a \X(w)\ 3 dw = a 2 A 2 „, 

2-7T J„ f | 



f/u.' 



where (a) again follows from w = uj/a. 

i.3. Uncertainty Principle for Sequences 

Prove Theorem |6.6| for real sequences. Do not forget to provide an argument for the strict- 
ness of inequality (16.18) ) . 



(Hint: Use the Cauchy— Schwarz inequality (11.24) ) to bound 



J — TV 



)■ 



Solution: Without loss of generality assume ||x|| = 1. Since x n is real, according to 
Table [372) |X(e : ''"')| is even, so fj, a = 0. We thus express the frequency spread from ( |6.14b[ ) 
and the time spread from 1 )6. 12b) as 



2ttA 2 = r \u>X(e^)\ 2 dw, 

2vrA 2 = 2tt J](n- Atn ) 2 x 2 = 2vr J^ \~J( n ~ Vn)x n \ 2 

n n 

dX(e*") 



B r 



dw 



+ JH n X(e?") 



dw, 



(E6.3-la) 



(E6.3-lb) 



where (a) follows from Parseval's equality ( [2.103) 1 and the differentiation-in-frequency prop- 
erty of the DTFT, ( [2790) . 
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Now let 

a = (uX(e? u ), dX{ - e ^"> +jf ln X(e j ")) (E6.3-2a) 

\ dw I 

uX(e? u ) '-duj-jii n / w\X(e^)\ 2 dw 

-n dw J_ 7I 

W r uX{en dX * ie3UJ) du, (E6.3-2b) 

J - w dw 

where in (a) the second term f* oj|X(e JU | dw is because | J Y(e J '"')| is even. Then. 

J—k \ dw dui J 

= r w _ (X(e^)X*{e^)) dw = r w — |X(e^)| 2 dw 
J- w dw J_ 7I dio 

,|X(e J ")|T - f W \X(e^)\ 2 dw = -2vr, (E6.3-2c) 



(O) 



where (a) follows from integration by parts. 
We can now use ( |E6.3-2c| ) to write 

4tt 2 = |-2tt| 2 = 4|Jt(a)| 2 < 4 \a\ 2 

i , v , ,■„, i 2 

(6) 



dw 



(<0 

< 4 



\uiX{e 3 ^)\ dw I 

-TV J — 



dw 
4(27rA^,)(27rAj;) = 4vr 2 4A 2 J A 2 l 



+ j(i n X(e?' J ) 



_» 



dw 



^ ' -Wo_ a2 \ fc*„ a 2 \ — /1^2 /1 a2a2 



where (a) follows because for any a £ C, |5ft(ct) | < \a\\ (b) from (]E6.3-2a] ); (c) from Cauchy— 
Schwarz inequality (11.24) 1; and (d) from (]E6.3-la[ )-( ]E6.3-lb[ ). proving the theorem. The 
Cauchy-Schwarz inequality holds with equality only with /3uX(e 3 ") = dX(e J ^)/dw + 
j/j, n X(e : ' u ), yielding a Gaussian function. As a Gaussian function is never zero, this 
contradicts the theorem's condition that X(e J7r ) = 0; the theorem thus holds with strict 
inequality. 
6.4. Uncertainty Principle for Finite-Length Sequences 

Let x n £ C N with the DFT X & . Let N n and JVj. denote the number of nonzero components 
of x and X, respectively. 

(i) Prove that X cannot have N n consecutive zeros, where consecutive is interpreted 
mod AT. 

(Hint: For an arbitrary selection of N n consecutive components of X, form a linear 
system relating the nonzero components of x to the selected components of X .) 
(ii) Using [(IJ) prove ( [6.20) , the uncertainty principle for finite- length sequences, due to 
Donoho and Stark [46] . 

Solution: 

(i) Let io, ii, . . . , *iV„— 1 De the indices of the N n nonzero components of x. Denote by 
JM = Xi n and by z n = W'n for < n < Af n — 1. 

We prove the result by contradiction. Suppose there exist N n consecutive zero 
components of the DFT of x, Xi c j tm for < m < N n — 1, for some < k < N — I. 
Observe that 

N-i iVn-i 

Xfe+m = 2^ x„w N y =2^ VnZn > 

n=0 n=0 
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for any < m < N n — 1. This system of linear equations can be expressed as 



X 



X-k 



X 



k + N„ 



III) 



yx„ 



ZY, 



k + n 



where Z is an N„ X N n matrix with elements Z m ^ n = z n i < m,n < N n — 1. 

By assumption, X is a zero vector. Vector Y consists of nonzero components; 
hence, the matrix Z has a nontrivial null-space. Since Z is a square matrix, it must 
be rank-deficient, that is, its rank is smaller than N n . 

However, Z = ZD, where Z m>n = z™ , < m,n < N n - 1; and D = 
diag(2^ )o<n<JV n -i- The matrix Z is a Vandermonde matrix as in (1 1.230 { con- 
structed from nonzero components; hence, it is of full rank N n . The matrix D is a 
diagonal matrix with nonzero diagonal elements; hence, it is also of full rank N n . As 
a product of two full-rank matrices, Z must also be a full-rank matrix, contradicting 
our previous statement. Thus, the X cannot have N n consecutive zero components, 
(ii) Arrange the points of X in a circle and choose one nonzero component to start from. 
Because of |(i)| this nonzero component can be followed by at most N n nonzero ones. 
Continuing the argument until we reach the initial point, we will have at least 
[N/N n \ + 1 nonzero components. Thus, the total number of nonzero components 
will be 



N k > 



N 



1 > 



N 



N n N k > N. 



Exercises 



Time Spreads for B-Splines 

Find the time spreads of the following B-splines: 

(i) Linear spline as in ( |5.33[ ). 



(ii) iVth order B-spline as in (|5.31b|) . 

6.2. Uncertainty Principle for Complex Functions 

Prove Theorem 16.31 without assuming that x(t) is a real function. 

(Hint: The proof requires more than the Cauchy— Schwarz inequality and integration by 
parts. Use the product rule of differentiation, (|a;(i)| 2 )' = x'(t)x*(t) + x(t)(x* (t))' . Also, 
use that for any a £ C, \ct\ > \a + o*|/2.) 



1.3. Properties of Modified Time and Frequency Spreads for Functions 
|(i)|(ii)| from Solved Exercise 1 6 . 2 1 hold for the time-frequency spreads A J 
in (|6.10a) ) and (|6.10b| ). respectively. 



(M 



and Auf defined 



6.4. Sequences with Finite Number of Nonzero Terms and Their DTFTs 

Show that if a sequence has a finite number of nonzero terms, then its DTFT cannot be 
zero over an interval (that is, it can only have isolated zeros). Conversely, show that if a 
DTFT is zero over an interval, then the corresponding sequence has an infinite number of 
nonzero terms. 



6.5. Properties of Time and Frequency Spreads for Sequences 
Consider the time spread A n and the frequency spread A u 
respectively. 



defined in (j6.12b| ) and (j6.14b) 



(i) Show that the time shift and frequency shift (complex modulation) of x n leave A n 

and Aw unchanged, 
(ii) Show that the upsampling by N followed by ideal lowpass postfiltering of x„ scales 

A n by N , while scaling A u by 1/iV, thus leaving the time-frequency product un- 
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changed. Show the counterpart of the same result for downsampling of a bandlimited 
sequence x £ BL[— n/N, Tt/N]. 

6.6. Rational Scale Changes for Sequences 

A scale change by a factor M/N can be achieved by upsampling by M followed by down- 
sampling by N. 

(i) Consider a scale change by 3/2, and show that it can be implemented either by 

upsampling by 3, followed by downsampling by 2, or the converse, 
(ii) Let M and N be coprime. Show that a sampling rate change by M/N cannot be 
undone unless N = 1. 

6.7. Signal Recovery Based on the Uncertainty Principle for Finite- Length Sequences 

The DFT Xf. of a length-A r sequence x n is known to have only N^ nonzero components. 
Using the result of Solved Exercise 16.41 show that this limited DFT-domain support allows 
for a recovery of a unique x n from any M time-domain components provided that 

2(7V - M)N k < N. 

{Hint: Show that nonunique recovery leads to a contradiction.) 
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Intermezzo 
Bridging Parts I and II 



We are about the embark on Part II of the book, where more advanced topics and 
more complex tools will be derived, all based on the foundations laid down in Part I 
of the book. This chapter serves as an intermezz o 103 ! between the two parts. 

Tools 

Where do we stand after the coverage of Part I? We have seen a number of basic 
and powerful tools and concepts: 

Geometry The inner product in the Hilbert spaces we consider leads to a familiar, 
yet powerful, geometrical view of signals and spaces. They include orthogonality 
between signals (vectors), and best approximation on a subspace by orthogonal 
projections. 

Existence of Bases The fact that (separable) Hilbert spaces allow for bases leads 
to natural representations in orthonormal or biorthogonal bases. In the case of 
linear operators, the basis of eigenvectors is particularly attractive. 

Fourier Representations For the signals and systems we consider, systems operate 
on signals using the convolution operator; this naturally leads to various forms of 
the Fourier transform. In particular, the eigensequence/eigenfunction property of 
complex exponentials lead naturally to linear shift-invariant systems. 

Sampling and Interpolation The interaction between the discrete and the con- 
tinuous world is realized using sampling and interpolation, two powerful tools that 
connect these two worlds. 

Approximation and Compression When signals cannot be perfectly represented, 
approximations are necessary. Then, quantitative results on how close an approxi- 
mated or compressed version is to the original, are very useful. 



3 Intermezzo is a short connecting instrumental movement in a musical piece. 
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Adapting Tools to Real-World Problems 

These basic results form the foundations upon which to build tools that are not only 
more advanced but also more practical; these tools need to be adapted to real-world 
problems taking into account the following: 

Finiteness and Localization Any real-world signal is of finite duration, although 
we use infinite ones, such as sine waves, as a convenient mathematical abstraction. 
Moreover, real-world signals are typically transient. How do we localize our signal 
analysis? How do we trade off localization (sharpness) in one domain for localization 
in the other, for example time and frequency domains? How do we detect transients? 

Prior Knowledge It helps to know what we are looking for: more often than not, 
we have some prior information about the signal or event we are interested in. Such 
priors on the signal class or the noise will help shape the solution. 

Limitations There are limits on what can be done: in the linear measurement case, 
bounds such as the uncertainty principle set limits to how sharp an analysis can 
ever be. In a noisy setting, estimation theory provides lower bounds on the variance 
of an estimator. For compression and communication, information theory bounds 
the performance of any possible scheme, by rate-distortion and capacity regions. 
Such bounds are useful in at least two fundamental ways: (1) they separate what 
can be done from what is impossible; and (2) they provide yardsticks for comparing 
practical systems to the theoretical performance limits. 

Computational Aspects Whatever the solution, it needs to be computable; for 
example, some of constructive solutions providing bounds on performance lead to 
hopelessly complex algorithms. Even seemingly simple problems such as best ap- 
proximation in a frame require exhaustive search, and thus become impractical for 
real-world problems. Instead, we will seek approximate solutions together with per- 
formance bounds. Often, the problem is structured and can thus lead to savings 
in computation. For problems that are shift invariant or can be decomposed into 
pieces that are, FFT can be used. However, since real-world signals are of finite, 
but arbitrary length, no algorithm can use an FFT over the entire signal, since it 
would not have linear complexity. Thus, time localization is necessary from the 
computational point of view as well. 

Bases and the Time-Frequency Plane 

To visualize localization properties of a representation, we use the time-frequency 
plane and Heisenberg boxes, conceptual tools we introduced in the last chapter. For 
example, Figure 16.31 showed a Heisenberg box plotted in the time-frequency plane 
for a function x(t) with a Fourier transform X(ut), conveying the idea that there 
exists a center of mass described by the function's center in time and frequency 
(/■*«) i^u), with a spread in time and frequency (A 4 , A u ). While this particular figure 
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. 3L _. 




(a) Kronecker delta representation. 



(b) Sine representation. 



Figure 1.1: Two representations with extreme localization properties, (a) The Dirac 
representation has perfect localization in time and no localization in frequency, (b) The 
sine representation has perfect localization in frequency and no localization in time. 



showed a Heisenberg box for a single function, one can draw these for all functions 
in a basis, for example, yielding an abstract time-frequency representation. This 
time-frequency representation is often called a spectrogram, and is used in a variety 
of applications, from speech and music analysis to mode detection and denoising. 

As an example, let us consider two extremes of representing a finite-length 
sequence: the standard basis consisting of shifted Kronecker delta impulses, and 
the sine basis. From what we have seen in Figure 16.161 we expect that the inner 
products with Kronecker delta basis sequences will isolate time-local events, while 
the inner products with the inverse sine basis sequences will isolate frequency-local 
events; each representation has perfect localization properties in one domain and no 
localization in the other. This is illustrated in Figure 11.11 where thick boxes are the 
Heisenberg boxes for the basis function ip^ in each case. For the Kronecker delta 
basis, the extent of the basis function covers the entire frequency axis (no frequency 
localization) and just the immediate neighborhood of the Kronecker delta impulse 
(perfect time localization). In contrast, for the sine basis, the extent of the basis 
function covers the entire time axis (no time localization) and just the immediate 
neighborhood of the box in frequency (perfect frequency localization) . 

Haar: An Efficient Orthonormal Basis with Time-Frequency Structure As a 

simple case study, we now build an orthonormal basis for ^ 2 (Z), but with some 
localization in both domains. For example, starting from the Kronecker delta basis, 
can we improve its localization in frequency slightly while not trading too much 
of its time locality? Figure |1.2 illustrates the desired tiling, where each tile from 
Figure 11.1( a) has been divided in two in frequency, thereby improving frequency 
localization; the price we pay is slightly worse time localization, where each tile has 
become twice as wide. 

Given this tiling, can we now find sequences to produce such a tiling? Let 
us concentrate first on the lower left-hand tile with the basis function ipQ. We will 
search for the simplest ipo which has roughly the time spread of 2 and frequency 
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Figure 1.2: Desired time-frequency tiling for an ortho-normal basis with a slightly better 
frequency localization than the Kronecker delta basis, but at a price of a slightly worse 
time localization. 



spread of tt/2. Assume we ask for tpo to be exactly of length 2, that is, 

Vo,n = cos98„ + sm98 n -i, 



(1.1) 



where we have also imposed \\<po\\ = 1, as we want it to be a part of an orthonormal 
basis. From ( 16. 12af ) . the time center for such a (po would be \i n = sin 6, and 
from ( ]6.12b[ ), its time spread would be 5 n = sin (26)/ 4. How about its frequency 
behavior? We are looking for tpo to be roughly a halfband lowpass sequence; let us 
thus ask for it to block the highest frequency 7r: 

$ (e jw )| = cos0 + sin0e _ia, | = cos<9-sin<9 = 0. (1.2) 

u ^ / \u — 7T \UJ—7T V ' 

Solving the above equation yields 6 = kir + 7r/4 and 



V>0,: 



-^{Sn+S n -l) tfio 



1 

x/2 



75 ° ° 



(1.3) 



with the time center /i„ = 1/2 and time spread 6 n = 1/2. We now repeat the process 
and try to find a ip\ of length 2 being a roughly halfband highpass sequence. We 
can use ( 11.1) as a general form of a sequence of length 2 and norm 1 . What we look 
for now is a sequence which has to be orthogonal to the first candidate basis vector 
ipo (if they are to be a part of an orthonormal basis), that is 



(tf0,fl) = 72 COS0 +72 Sin ^ = °) 

which yields 6 = (2k + 1)tt/2 + 7r/4, and one possible form of tpx 
■^(Sn-S n -i) (fii = 



<Pi, 



l 

V2 



75 ° ° 



(1.4) 



(1.5) 



Note that while we did not specifically impose it, the resulting sequence is indeed 



highpass in nature, as $i(e : '") 



0. 
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So far so good; we only have infinitely many more functions to find. To make 
the task less daunting, we search for an easier way, for example, by shifting ipo 
and (pi along the time axis by integer multiples of 2. Call <p2fc,n = i Po,n-2k and 
l f2k+i.n = ¥>i,n-2k- Then indeed, the set $ = {<Pk}kez is an orthonormal set, 
which: 

(i) possesses structure in terms of time and frequency localization properties (it 
serves as an almost perfect localization tool in time, and a rather rough one 
in frequency); 

(ii) is efficient (it is built from two template functions and their shifts). 

As we well know from Chapter JJ orthonormality of a set is not enough for that set 
to form an orthonormal basis; it must be complete as well. We now show that this 
set is true. An easy way to do this is to observe that </?2fc and <P2k+i each operate 
on two samples of the input sequence only: for n = 2k, 2k + 1. Thus, it is enough 
to show that tp2k> f>2k+i form an orthonormal basis for those x € £ 2 (Z) which are 
nonzero only for n = 2k, 2k + 1. This is further equivalent to showing that vectors 
(l/v2) [l l] and (1/V2) [l — l] form an orthonormal basis for K 2 , a trivial 
fact. As this argument holds for all k, we indeed have that $ is an orthonormal 
basis for ^ 2 (Z), known under the name Haar basis: 



(x, ip k )pk 



fcez 

22(X, <P2k)<fi2k 



22(X, < y 5 2 fc+l)</'2/c+l 

fcez 



2^ v 7 ? 

fee 



7j(x 2 fc + X2k+l)<P2k + J^ TTf (^fc _ ^2fc+l)v 3 2fc+l 



(1.6) 



A-e: 



where we have separated the basis functions into two groups: those which are 
obtained by shifting the lowpass template ifo, and those which are obtained by 
shifting the highpass template tp\. 

From the same argument we just went through, we see what would happen if 
we would remove one tile/basis function, say (f2k'- those sequences 

We can now use all the machinery we developed in Chapter [Tj, and look at 
projections, matrix view, etc. For example, the matrix representing this basis is 



<I> 



l 















m 


1 








i 


-1 














1 


1 








1 


-1 















Vo 



•pi 



•p-2 



<P3 



(1.7) 
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The matrix is block diagonal, with blocks of size 2x2. 

From ( 11.6) , we can immediately see that the Haar orthonormal basis projects 
onto two subspaces: lowpass space V, spanned by the lowpass template tpo and its 
even shifts, and the highpass space W, spanned by the highpass template ip± and 
its even shifts: 

V = span({( y 5o,„_ 2 fe}fcez), W = span({(pi tn - 2 k}ksz)- (1.8) 

From ( II. 6[ ), the lowpass and highpass projections are: 



Xy 



x w 



\{xq + Xi) \(xq+xi) l(x 2 + x 3 ) ^(x 2 +x 3 ) ...| (L9a) 

T 



\{x -xi) -\{xq - Xl) \(X2 -xz) -\(X2-X$) ...1(1. 9b) 



Indeed, xy is a smoothed version of x where every two samples have been replaced 
by their average, while xw is the detailed version of x where every two samples 
have been replaced by the their difference (and its negative). As the sequences in 
V and W are orthogonal, and the expansion is a basis, 

£ 2 {Z) = V®W. (1.10) 

In the next chapter, we will show efficient ways of implementing this, and 
other, more general orthonormal bases, using filter banks. 

Examples of Real-World Signals 

To motivate the construction of specific tools in Part II, we now show a few real- 
world signals with associated problems, and broadly outline possible solutions. 

The Transient Nature of Polyphonic Music We start with one of the most famous 
and popular pieces of western classical music, Ravel's Bolero. It starts with a single 
instrument, the piccolo flute, almost whispering the theme, and ends with the full, 
120-instrument orchestra thundering the finale. Clearly, the local characteristics of 
the piece evolve dramatically from beginning to end, illustrated by the 15-minute 
time-domain display of the acoustical signal in Figure 11.3( a). While the Fourier 
transform of that sequence, shown in Figure 11.3( b). exhibits a number of spectral 
peaks corresponding to some key harmonic structures, the signature evolution of 
the Bolero from the introduction to the grand finale is lost. 

To understand the local behavior, we look at a short piece, from 12 — 15 sec of 
the Bolero and its Fourier transform. Figure [L3l fc) shows the local behavior in time, 
while Figure |L3T d) shows the local behavior in frequency, that is, those frequencies 
that are active in this part of the piece. We immediately come across a practical 
issue: extracting a part of the Bolero means multiplying the time-domain sequence 
with a rectangular window as was done in Example 2.41 In frequency domain, 
this windowing process amounts to a convolution with the Fourier transform of the 
window, smoothing the spectrum. This is the cost we pay for localizing Bolero in 
time. 
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(a) 



(b) 




ULLjjIl 



J * ki*l l~AM L. 



(c) 



(d) 



Figure 1.3: The entire Ravel's Bolero in (a) time and (b) frequency domains. A 4-sec 
segment in (c) time and (d) frequency domains. 



We have seen this effect in Example 12.41 in time domain only. Figure II. 4 shows 
it for a sinusoid and two different windows from Example 12.41 both in time and 
in frequency. With no windowing, the DFT contains a double peak (because the 
sequence is real, red stems in the figure). With a sharp window in time, rectangu- 
lar window ( j2.12| ). the DFT is the convolution of the two peaks with the sine in 
frequency domain, creating a diluted version of the peaks as in Figure 13(a). With 
a smoother window in time, raised cosine window ( 12.15) , the DFT is sharper and 
closer to peaks as in Figure 11.4( a) . 

Windowing leads us to notion of a local, or, short-time Fourier transform, 
introduced in Chapter [8J The idea is to apply a moving window over the signal, 
followed by a Fourier transform. The resulting local Fourier spectrum has now two 
indices; one for the location of the window, or a time index, and one for the local 
frequency index. 

This discussions leads us naturally to the analysis in the time- frequency plane, 
shown in Figure 16.31 The frequency analysis is now localized around a given time 
location. A time-frequency plot of Ravel's Bolero is shown in Figure 11.51 Now, one 
sees clearly the evolution of the piece in this time segment, both in terms of spectral 
evolution, rhythms and intensity. 

The idea of time-frequency analysis is also central to the MP3 audio-coding 
standard, where it is performed using an orthonormal time-frequency basis (which 
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Figure 1.4: The DFT of a sinusoidal sequence x n — sin((7r/8)n + tt/2) (red stems) from 
Figure [272] and the DFTs of its windowed versions w n x n using (a) the rectangular window 
( |2.12j) and (b) the raised cosine window ( 12.151) , both of length no — 26 (stem plots). 



Figure 1.5: Time- frequency analysis of a segment of Ravel's Bolero. 



we will construct basis in Chapter [8). Following this analysis, the expansion coeffi- 
cients in the orthonormal basis are carefully quantized so as to minimize perceptual 
distortion. Overall, the bit rate can be reduced by an almost order of magnitude 
without audible distortion. 

Visions of an Image In our music- analysis example, we mapped a one-dimensional 
sequence into a two-dimensional one, an image. Starting with an image, we might 
end up with an even higher-dimensional representation. To explore this, we look 
again at All is Vanity shown in Figure [6721 (a) . 

One very standard representation, often used in computer vision, is the image 
pyramid. It consists in computing a lowpass approximation (or projection onto the 
space of lowpass images) as well as a difference image. The latter is an approxima- 
tion to a derivative of the image, thus enhancing edges, while being close to zero in 
smooth parts. The block diagram of one step of such a decomposition is shown in 
Figure [L6l(a), while the usual projection and difference is displayed in Figure [LoTb) . 
Of course one can iterate this scheme. Instead of showing the projections, one 
usually shows the coefficients, that is the downsampled images, both lowpass and 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



Intermezzo 



561 



TfBD 

Figure 1.6: Pyramid decomposition, (a) Block diagram, (b) Geometry of decomposition. 

TfBD 



Figure 1.7: Pyramid decomposition of "All is Vanity", (a) Sequence of lowpass and 
downsampled images, (b) Respective difference images. 





It J 


Fl 


9 






9 


9 


[- 


9 


% 


9; 


;">i; ; . 


9 


9 



Figure 1.8: The notion of resolution, (a)-(d) Original as well as lowpass version, (e)-(h) 
Original as well as noisy version. 



difference. This is shown in Figure |I.7l where we see recursively the successive low- 
pass filtered and downsampled images, as well as the difference between successive 
levels. As is clear from the sequence of lowpass versions, the "lady in front of the 
mirror" soon gives way to a skull... Because the pyramid decomposition can be 
perfectly inverted (as will be shown in Chapter HO) , the difference images contain 
all it takes to go from the skull to a young lady. A word of care is necessary here: 
typically, the lowpass version is a good predictor for the full-resolution image. This 
is clearly not the case here, since the intent of the painter was to create a visual 
illusion, or a perceptual tension between the sharp and the blurred versions of the 
image. Lowpass filtering, or reducing bandwidth, can be seen as loss of resolution. A 
similar effect can be achieved by reducing the signal to noise ratio, or adding noise. 
Perceptually, we will have trouble identifying the relevant information because of 
noise. Figure |1.8 compares reducing bandwidth with adding noise, indicating the 
similarity of the two effects. Intuitively, the notion of "resolution" is therefore re- 
lated to the ability to extract information. In this case, the underlying information 
is the " lady in front of the mirror" , and too much noise or insufficient bandwidth 
both preclude our ability to extract this information. The pyramid scheme seen 
above is both simple and intuitive. Its only drawback is that it is redundant, for 
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Figure 1.9: Wavelet decomposition (a) Coefficients in the various channels (b) Projection 
onto the various subspaces. 



example, one step as in Figure 11.6( a) maps N samples into N/2 lowpass coefficients 
and N difference samples, an increase by 50% (in two dimensions, the increase is 
smaller, since the lowpass coefficients amount to N 2 /A for iV 2 input pixels). 

If one chooses a wavelet decomposition instead, it is possible to have an or- 
thonormal basis and expansion therein. The details will be the topic of Chapter \9\ 
suffices to say that the lowpass version is similar to the pyramid case, while the 
difference channel (in two dimensions) is now made up of three channels with the 
combinations of highpass/lowpass in horizontal and vertical directions using sepa- 
rable two-dimensional filters. Since each channel is downsampled by 4, we have no 
redundancy. 

Note: Do we put a block diagram? 

Figure 1.9 shows a wavelet decomposition of "All is Vanity" , where the decom- 
position is iterated on the smoothed, lowpass version. In part (a), the coefficients 
are shown, while in part (b), the projections onto the various subspaces are depicted. 
Similarly to the pyramid case seen earlier, the lady vanishes to leave a skull as we go 
down the repeated lowpass branch, or the projection onto the lowpass restriction. 
The various highpass branches enhance vertical, horizontal, or diagonal edges, de- 
pending on which highpass filter is involved. This wavelet decomposition of images 
is at the heart of the JPEG2000 image compression standard. In addition to the 
decomposition, or analysis, the compression involves sophisticated quantization and 
bit allocation to reach compression factors by an order of magnitude without much 
visible artifacts. For reconstruction, since it is an orthonormal decomposition, one 
simply uses the transposed operator, or upsampling and interpolation. 

If there is such a thing as JPEG2000, there must be a straight JPEG as 
a precursor. Indeed, a simpler image coding standard preceded the one based 
on wavelets, and its simplicity and good performanc e 104 ! keeps JPEG as the front 
runner image coding standard. Interestingly, it is in some sense closer to a short-time 
Fourier analysis than to a wavelet decomposition. To make things short, an image 
is sliced up into blocks of M by M pixels, and each block to transformed by a two- 
dimensional DCT, which is similar to a real valued discrete Fourier transform. In 
the DCT domain, smart quantization and entropy coding reaches good compression. 

Since cutting into blocks is like applying a window of size M by M, and taking a 
DCT similar to a DFT, we have indeed a short-time Fourier transform like analysis. 
Figure H.lOl shows a DCT analysis of our now classic image in (a), the individually 
transformed blocks in (b), and the reordered coefficients such that coefficients of the 
same frequency indices (i,j) from each block are gathered together. Sure enough, 
the skull appears clearly in the coefficients of order (0,0), or local averages in each 
block. 



As well as a clearer situation regarding the patent situation. 
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Figure 1. 10: Transform used in JPEG (a) Original (b) DOT coefficients, (c) Reordered 
DCT coefficients. 

TfBD 



Figure 1.11: Communication or signaling over a channel. Based on a sequence {cti\, a 
signal is synthesized to be sent over a channel, typically a convolution. At the receiver, 
estimates {&i} of the coefficient are retrieved. 



Communications with the Short-Time Fourier Transform The two previous ex- 
amples were representation problems, or taking a signal and finding a good expan- 
sion space to achieve a more compact representation. 

For communication, we look at a dual problem, namely signaling. We want 
to synthesize a signal based on some given coefficients {«i} that carry information. 
This synthesized signal is then sent over a channel, typically represented by a con- 
volution, possibly slowly varying over time, and received at a decoder where the 
sequence {a{\ or an approximation {&i} needs to be retrieved. The situation is 
schematically shown in Figure II. Ill If the channel is well modeled by a convolution 
with known impulse response hit), it is natural to use complex sinusoids for sig- 
naling. Since they are eigenfunctions of the convolution operator, the various {at} 
are not mixed by the channel, but simply weighted by the Fourier transform of the 
impulse response, according to the Fourier convolution theorem, ( J3.64J ). For sim- 
plicity, we recall the simplest version in matrix algebra. Assume a vector x 6 C w 
and a circular convolution matrix H . If we use x directly as input to the channel, 
we obtain as output 

V = Hx, 

or, a convolved version of the various input values. Instead, use the DFT basis 
vectors for signaling, each weighted by an entry of x, or, 

X = Fx. 

The effect of the channel is to weight the vectors of F by the DFT of filter, 

Y = HX = HFx = FAx, 

where (a) follow from ... and A is the diagonal matrix of DFT coefficients of the 
filter. Assuming A known and invertible (no zeros in the DFT spectrum), then we 
can retrieve the original values by inverse DFT and equalization of the channel, 

x = A^F^Y, 

where x = x in this idealized, noiseless case. This "signaling by Fourier basis 
vectors" is conceptually depicted in Figure 11.121 
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Figure 1.12: Block diagram of DFT signaling over a circular convolution channel. 

TfBD 



Figure 1.13: Time-frequency signaling or modulation. Basis functions localized both 
in time and frequency are used to carry information over a slowly varying convolution 
channel. 



Now, real life is more complicated than the abstracted version above, which 
assumed discrete-time, periodic sequences. The idea is to use localized Fourier basis 
vectors that live within a certain time window. This is again a short-time Fourier 
transform view of the problem. While these localized basis functions do not sat- 
isfy an exact eigenfunction property anymore, they do so approximately, and well 
enough to be able to carry information over a channel. In addition, if the channel 
is time varying, like in mobile communication, the finite time support of the basis 
functions become advantageous. That is, one can apply a local equalization to com- 
pensate for the channel effects and the time of the localized signaling, something 
long basis functions cannot do. In sum, time-frequency signaling achieves both 
time locality as well as an approximate eigenfunction property. Combined with a 
localized equalization to deal with time-varying channels, it permits efficient trans- 
mission over many communication channels. As a case in point, the orthogonal 
frequency division multiplexing (OFDM) standard is the basis for many successful 
communication methods, like for example Wi-Fi. 

Fig 11.131 shows schematically a time- frequency signaling method that underlies 
OFDM. Of course, a practical system involves many more components, like for 
example using pilots to estimate the channel, which are dedicated time frequency 
basis functions used to probe the time-varying channel. 

Predicting the Future Using Orthogonal Projections Here something about LPC. 
TBD. 

That's all, folks! 
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Representations for 

Signal Processing 



The second part of the book develops a variety of signal representations and brings 
all the tools together leading to sparse signal processing. It covers both discrete- 
and continuous-time signals; sets of vectors that form bases and frames; and Fourier 
and wavelet constructions. We progress from discrete time in Chapters [THlOl to 
continuous time in Chapters HIH12[ and, finally, to Chapter [13] which gives an 
introduction and overview of the main ingredients of sparse signal processing 

Within these sets of chapters, we cover Fourier techniques first (Chapters \8\ 
and [IT) and wavelet techniques second (Chapters [91 and [121) . Representations using 
frames (^over sampled representations) exist in both discrete and continuous time 
and in both Fourier and wavelet style; these are covered in Chapters [1014121 

Chapter [7} Filter Banks: Building Blocks of Time-Frequency Expan- 
sions, develops two-channel filter banks as a computationally-efficient tool both for 
computing basis expansions of discrete-time signals as well as for the reconstruction 
of these signals. The representations associated with two-channel filter banks are 
simultaneously the simplest of Fourier and wavelet representations on ^ 2 (Z); thus, 
the detailed developments in this chapter lay the groundwork for Chapters [814101 

Chapter [8j Local Fourier Bases on Sequences, develops local Fourier 
representations for ^ 2 (Z), through a generalization from two channels to N channels. 
It yields local Fourier series representations of sequences, also known as windowed 
Fourier, Gabor, or short-time Fourier representations. They inherently split ^ 2 (Z) 
into N bands equally spaced in frequency. While there exist no good local Fourier 
bases (apart from minimally short ones), there exist good local cosine bases, where 
the complex-exponential modulation is replaced by cosine modulation. Moreover, 
there exist good local Fourier frames, a topic covered in Chapter [101 

Chapter [9} Wavelet Bases on Sequences, develops wavelet representa- 
tions for ^ 2 (Z), through a generalization from two channels to tree-structured filter 
banks. These tree-structured filter banks are not arbitrary, rather they build un- 
balanced trees by iterating on the lowpass (coarse) branch only. This results in an 
octave-band filter bank, or a discrete wavelet transform. 

Chapter 10t Local Fourier and Wavelet Frames on Sequences, devel- 

565 
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ops frame representations for £ 2 (1i), both in Fourier and wavelet flavors, through a 
generalization from critically-sampled to oversampled filter banks. The redundancy 
inherent in the representation allows for significant freedom in design. Not only can 
we look for the best expansion, we can also look for the best expansion coefficients 
given a fixed expansion and under desired constraints (sparsity being one example). 

Chapter lit Local Fourier Transforms, Frames and Bases on Func- 
tions develops local Fourier representations for £ 2 (K). The aim follows that of 
Chapter [8j but for functions. We look for ways to localize the analysis Fourier trans- 
form provided by windowing the complex exponentials. The chapter starts with the 
most redundant representation, the local Fourier transform, and then samples it to 
obtain local Fourier frames. With critical sampling we then try for local Fourier 
bases, where, again, bases with simultaneously good time and frequency localization 
do not exist, a result known as the Balian-Low theorem. Cosine local Fourier bases 
do exist, as do wavelet ones we discuss in the next chapter. 

Chapter 12: Wavelet Bases, Frames and Transforms on Functions 
develops wavelet representations for C 2 (K) . It shows how to overcome the roadblock 
from Chapter [LI] when trying to construct local Fourier bases with reasonable joint 
time and frequency localization. While bases are possible with cosine, instead of 
complex-exponential, modulation, we can do even better. In this chapter, we start 
with a whole wealth of wavelet bases, and then go in the direction of increasing 
redundancy, by building frames and finally the continuous wavelet transform. 

Chapter 13fc Approximation, Estimation, and Compression develops 
the main ingredients of sparse signal processing, namely approximation methods 
and their performance, estimation procedures based on adapted representations, 
for example, denoising, compression methods and their performance, and finally a 
glimpse at inverse problems, including sparse sampling methods. As the aim of 
many signal processing algorithms is to transform high-dimensional problems into 
smaller-dimensional ones, and this by either linear or nonlinear methods, the ba- 
sic axiom is that there exists a representation where the initial high-dimensional 
problem has a low-dimensional solution. This sweeping statement needs some qual- 
ifications, since such a solution is clearly not always known. However, we show 
many where there is a solution of this kind and of which we give a brief overview. 
The classic Karhunen-Loeve expansion of stochastic processes is an example where 
linear approximation is optimal for least-squares approximation. The success of 
compression methods using the DCT and wavelets relies on nonlinear approxima- 
tion in bases. And some recent solutions for inverse problems rely on a sparsity 
prior in a basis or a frame representation. 
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The aim of this chapter is to build discrete-time bases with desirable time- 
frequency features and structure that enable tractable analysis and efficient algo- 
rithmic implementation. We achieve these goals by constructing bases via filter 
banks. 

Using filter banks provides an easy way to understand the relationship between 
analysis and synthesis operators, while, at the same time, making their efficient 
implementation obvious. Moreover, filter banks are at the root of the constructions 
of wavelet bases in Chapters [91 and [T2l In short, together with discrete-time filters 
and the FFT, filter banks are among the most basic tools of signal processing. 

This chapter deals exclusively with two-channel filter banks since they are 
(1) the simplest; (2) reveal the essence of the ./V-channel ones; and (3) are used 
as building blocks for more general bases. We focus first on the orthogonal case, 
which is the most structured and has the easiest geometric interpretation. Due 
to its importance in practice, we follow with the discussion of the biorthogonal 
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case. We consider real-coefficient filter banks exclusively; pointers to complex- 
coefficient ones, as well as to various generalizations, such as ./V-channel filter banks, 
multidimensional filter banks and transmultiplexers, are given in Further Reading. 

7.1 Introduction 

Implementing a Haar Orthonormal Basis Expansion 

At the end of the previous chapter, we constructed an orthonormal basis for £ 2 (Z) 
which possesses structure in terms of time and frequency localization properties 
(it serves as an almost perfect localization tool in time, and a rather rough one in 
frequency); and, is efficient (it is built from two template sequences, one lowpass 
and the other highpass, and their shifts). This was the so-called Haar basis. 

What we want to do now is implement that basis using signal processing 
machinery. We first rename our template basis sequences from ( 11.3) and ( 11.5) as: 

9n = </?0,n = -^(Sn+ S n -l), (7.1a) 

h„ = (fii,„ = -^{5 n - S n -i). (7.1b) 

This is done both for simplicity, as well as because it is the standard way these 
sequences are denoted. We start by rewriting the reconstruction formula ( [1.6) as 



;/.,; 



/, {X, tf2k) V?2fc,n + J^ {x, <P2k+l) ¥>2k+l,n 
a k (3 k 

= 2__, a k flk,n + 2-^Pk <f2k+l,n 

fcez " " ' fcez ^""^ ' 

9n-2k h n -2k 

= 22ct k gn~2k + 22Pkh n -2k, (7.2) 

fcez feez 

where we have renamed the basis functions as in (7.1) , as well as denoted the 
expansion coefficients as 

{x, ifi2k) = (Xn, 9n~2k)n = Ct k , (7.3a) 

(X, Cf2k+l) = (X n , h n -2k)n = Pk- (7.3b) 

Then, recognize each sum in ( ]7.2j) as the output of upsampling followed by filtering 
( 12.198) with the input sequences being a k and f3 k , respectively. Thus, the first sum 
in (7.2) can be implemented as the input sequence a going through an upsampler by 
2 followed by filtering by g, and the second as the input sequence /? going through 
an upsampler by 2 followed by filtering by h. 

By the same token, we can identify the computation of the expansion coeffi- 
cients in ( 17.3) as (2.195) , that is, both a and (3 sequences can be obtained using 
filtering by <?_„ followed by downsampling by 2 (for a k ), or filtering by /i_„ followed 
by downsampling by 2 (for /3k)- 

We can put together the above operations to yield a two- channel filter bank 
implementing a Haar orthonormal basis expansion as in Figure [77T(a). The left part 
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Figure 7.1: A two-channel analysis/synthesis filter bank, (a) Block diagram, where 
an analysis filter bank is followed by a synthesis filter bank. In the orthogonal case, 
the impulse responses of the analysis filters are time-reversed versions of the impulse 
responses of the synthesis filters. The filter g is typically lowpass, while the filter h is 
typically highpass. (b) Frequency responses of the two Haar filters computing averages 
and differences, showing the decomposition into low- and high-frequency content. 



that computes the expansion coefficients is termed an analysis filter bank, while the 
right part that computes the projections is termed a synthesis filter bank. 

As before, once we have identified all the appropriate multirate components, 
we can examine the Haar filter bank via matrix operations (linear operators). For 
example, in matrix notation, the analysis process ( 17. 3) can be expressed as 
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and the synthesis process ( 17.2) as 
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or 



$$ T x 



$$ J 



I. 



(7.6) 



Of course, the matrix $ is the same matrix we have seen in ( ]I.7| ). Moreover, from 
( 17.6) , it is a unitary matrix, which we know from Chapter [1] Chapter at a Glance, 
implies that the Haar basis is an orthonormal basis (and have already shown in 
Chapter [6). Table [7781 gives a summary of the Haar filter bank in various domains. 



Implementing a General Orthonormal Basis Expansion 

What we have seen for the Haar orthonormal basis is true in general; we can con- 
struct an orthonormal basis for £ 2 (Z) using two template basis sequences and their 
even shifts. As we have seen, we can implement such an orthonormal basis using a 
two-channel filter bank, consisting of downsamplers, upsamplers and filters g and h. 
Let g and h be two real-coefficient, causal filters ] 105 ! where we implicitly assume that 
these filters have certain time and frequency localization properties, as discussed in 
Chapter \6\ (g is lowpass and h is highpass) . The synthesis (7.5| ) generalizes to 
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(7.7) 



with the basis matrix $ as before. To have an orthonormal basis, the basis sequences 
{<Pk}ke7. — even shifts of template sequences g and h, must form an orthonormal set 
as in (1.83) , or, $ must be unitary, implying its columns are orthonormal: 



(g n , 9n 



-2k) 



{h n , K 



-2k) 



6k . 



(gn, h n ^2k)i 



0. 



(7.8) 



We have seen in ( 12.209) that such filters are called orthogonal; how to design them 
is a central topic of this chapter. 

As we are building an orthonormal basis, computing the expansion coefficients 
of an input sequence means taking the inner product between that sequence and 
each basis sequence. In terms of the orthonormal set given by the columns of $, 



105 While causality is not necessary to construct a filter bank, we impose it later and it improves 
readability here. We stress again that we deal exclusively with real-coefficient filters. 
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As in the Haar case, this can be implemented with convolutions by g_„ and h- n , 
followed by downsampling by 2 — an analysis filter bank as in Figure PfTU fa). In filter 
bank terms, the representation of x in terms of a basis (or frame) is called perfect 
reconstruction. 

Thus, what we have built is as in Chapter |6j — an orthonormal basis with struc- 
ture (time and frequency localization properties) as well as efficient implementation 
guaranteed by the filter bank. As in the Haar case, this structure is seen in the 
subspaces V and W on which the orthonormal basis projects; we implicitly assume 
that V is the space of coarse (lowpass) sequences and W is the space of detail 
(highpass) sequences. Figure 17.3 illustrates that, where a synthetic sequence with 
features at different scales is split into lowpass and highpass components. These 
subspaces are spanned by the lowpass template g and its even shifts (V) and the 
highpass template h and its even shifts (W) as in ( |I.8j ): 



V = span({< y 5o,„_2fe}/cez) 
W = span({(£i, n _2fe}fcez) 



span({g„_ 2 fe}fcez 
span({/i„-2fc}fcGZ 



and produce the lowpass and highpass approximations, respectively: 

Xv = 2_j a k9n-2k, 

k-ez 



XW = y^fffcfen 

kez 



-2k- 



(7.10a) 
(7.10b) 



(7.11a) 
(7.11b) 



As the basis sequences spanning these spaces are orthogonal to each other and all 
together form an orthonormal basis, the two projection subspaces together give back 
the original space as in (1. 10) : £ 2 (Z) = V © W. 

In this brief chapter preview, we introduced the two-channel filter bank as in 
Figure mT a) . It uses orthogonal filters satisfying (778) and computes an expansion 
with respect to the set of basis vectors {g n -2k,h n -2k}kez, yielding a decomposi- 
tion into approximation spaces V and W having complementary signal processing 
properties. Our task now is to find appropriate filters (template basis sequences) 
and develop properties of the filter bank in detail. We start by considering the 
lowpass filter g, since everything else will follow from there. We concentrate only 
on real-coefficient FIR filters since they are dominant in practice. 
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Z 2 (Z) 



x w 



V 



Figure 7.2: A sequence x is split into two approximation sequences xv and Xw- An 
orthonormal filter bank ensures that xv and xw are orthogonal and sum up to the original 
sequence. We also show the split of £ 2 (Z) into two orthogonal complements V (lowpass 
subspace) and W (highpass subspace). 



Chapter Outline 

We start by showing how orthonormal bases are implemented by orthogonal filter 
banks in Section 7.2 and follow by discussing three approaches to the design of 
orthogonal filter banks in Section 7.31 We then discuss the theory and design of 
biorthogonal filter banks in Sections 17. 4| and |7.5i In Section [7761 we discuss stochastic 
filter banks, followed by algorithms in Section [7.71 

Notation used in this chapter: In this chapter, we consider real-coefficient filter 
banks exclusively; pointers to complex-coefficient ones are given in Further Reading. 
Thus, Hermitian transposition will occur rarely; when filter coefficients are complex, 
the transposition in some places should be Hermitian transposition, however, only 
coefficients should be conjugated and not z. We will point these out throughout 
the chapter. D 



7.2 Orthogonal Two-Channel Filter Banks 

This section develops necessary conditions for the design of orthogonal two-channel 
filter banks implementing orthonormal bases and the key properties of such filter 
banks. We assume that the system shown in Figure P7TTT a) implements an orthonor- 
mal basis for sequences in £ 2 (Z) using the basis sequences {g n -2k,h n -2k}kez- We 
first determine what this means for the lowpass and highpass channels separately, 
and follow by combining the channels. We then develop a polyphase representation 
for orthogonal filter banks and discuss their polynomial approximation properties. 
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Figure 7.3: A sequence and its projections, (a) The sequence x with different-scale fea- 
tures (low- frequency sinusoid, high-frequency noise, piecewise polynomial and a Kronecker 
delta sequence), (b) The lowpass projection xv- ( c ) The highpass projection xw- 



7.2.1 A Single Channel and Its Properties 

We now look at each channel of Figure 17. 1 [ separately and determine their properties. 
As the lowpass and highpass channels are essentially symmetric, our approach is 
to establish (1) the properties inherent to each channel on its own; and (2) given 
one channel, establish the properties the other has to satisfy so as to build an 
orthonormal basis when combined. While we have seen most of the properties 
already, we summarize them here for completeness. 

Consider the lower branch of Figure PfTT a), projecting the input x onto its low- 
pass approximation xy -, depicted separately in Figure [731 In ( 17.11aj h that lowpass 
approximation xy was given as 



xv = y^ ak9n- 

fcez 



2k- 



Similarly, in ( 17. lib} ), the highpass approximation xw was given as 



x w 



E 

feez 



0khn- 



2k- 



(7.12a) 



(7.12b) 



Orthogonality of the Lowpass Filter Since we started with an orthonormal basis, 
the set {<7n-2fc}feez is an orthonormal set. We have seen in Section [2.7.51 that such 
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Z 2 (Z) V 

9-n — (\2\ (Jy— 9n 



* Xy 



Figure 7.4: The lowpass branch of a two-channel filter bank, mapping x to xv- 
a filter is termed orthogonal and satisfies ( |2.209| ): 



MatrixVicw ^ /~iT /~itt 

< > L>2<J UU2 

ZT 



<5«, 9n~2k) =4 ^ G(z)G(z- 1 ) + G{-z)G(-z- 1 ) = 2 (7.13) 

^ T |G(e^)| 2 + \G{e^+^)\ 2 = 2 

In the matrix view, we have used linear operators (infinite matrices) introduced in 
Section [2.71 These are: (1) downsampling by 2, D2, from ( 12. 180a) ; (2) upsampling 
by 2, U2, from ( 12. 185a) ; and (3) filtering by G, from ( 12.63) . The matrix view 
expresses the fact that the columns of GU2 form an orthonormal set J 106 | The DTFT 
version is the quadrature mirror formula from ( 12.2081 ). 



Lowpass Channel in a Two-Channel Orthogonal Filter Bank 

Lowpass filter 

Original domain g„ (g n , ff„_ 2 fe>n = h 

Matrix domain G D 2 G T GU 2 = I 

^-domain G(z) G{z)G(z~ 1 ) + G(-z)G{-z~ 1 ) = 2 

DTFT domain G{e j ") \G(e juJ )\ 2 + \G{e j ^+^)\ 2 = 2 

(quadrature mirror formula) 

Polyphase domain G(z) = Go(z 2 ) + z~ 1 Gi{z 2 ) G {z)G {z- 1 ) + Gi{z)G 1 {z~ 1 ) = 1 

Deterministic autocorrelation 

Original domain a„ = {g k , g k +n)k «2fe = 4 

Matrix domain A = G T G D 2 AU 2 = I 

2-domain A(z) = G{z)G{z~ 1 ) A(z)+A{-z) = 2 



A(z) = l + 2j2<>2k + i(z 2k+1 +z- (2k+1) ) 



k = 



DTFT domain A(e juJ ) = \G{e juJ )\ A(e juJ ) + A(e j(uJ+7T '>) = 2 

Orthogonal projection onto smooth space V = span({g„_2fc}fc e z) 
x v = P v x P v = GU 2 D 2 G T 



Table 7.1: Properties of the lowpass channel in an orthogonal two-channel filter bank. 
Properties for the highpass channel are analogous. 



10s We remind the reader once more that we are considering exclusively real-coefficient filter banks, 
and thus transposition instead of Hermitian transposition in ( |7.13[ ). 
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Orthogonality of the Highpass Filter Similarly to {g n -2k}kez, the set {h n - 2 k}kez 

is an orthonormal set, and the sequence h can be seen as the impulse response of 
an orthogonal filter satisfying: 

MatrixVicw ~ ttT tttt t 

< > L>itl tlUi = 1 

(h n ,h n ^2k) = S k £E> H(z)H{z- 1 ) + H{-z)H{-z- 1 ) = 2 (7.14) 

D ^J |iJ(e^)| 2 + |iJ(e^" +7r ))| 2 = 2 

The matrix view expresses the fact that the columns of HU2 form an orthonormal 
set. Again, the DTFT version is the quadrature mirror formula from ( 12.2081 ) . 

Deterministic Autocorrelation of the Lowpass Filter As it is widely used in filter 
design, we rephrase ( 17.13) in terms of the deterministic autocorrelation of g, given 
by HM- 



MatrixVicw 


D 2 AU 2 


= I 


ZT 
DTFT 


A{z) + A{-z) 


= 2 
= 2 



<3n, g n -2k) = a 2k = 4 4^U A(z) + A(-z) = 2 (7.15 



In the above, A = G T G is a symmetric matrix with element a& on the fcth diagonal 
left/right from the main diagonal. Thus, except for ao, all the other even terms of 
a,k are 0, leading to 

00 
A(z) ( =' G(z)G(z- 1 ) = l + 2^ a2fe+1 (^ fe+1 + z^ 2fe+1 )), (7.16) 

fe=0 

where (a) follows from ( 12.142) . 

Deterministic Autocorrelation of the Highpass Filter Similarly to the lowpass 
filter, 



Matrix View 


D2AU2 


= I 


ZT 


A{z) + A{-z) 


= 2 


DTFT 


A{e ju} ) + A{e^ u}+ ^) 


= 2 



(h n , K„2k) = a 2k = 4 ^U A(z) + A(-z) = 2 (7.17) 



Equation ( 17.16) holds for this deterministic autocorrelation as well. 

Orthogonal Projection Property of the Lowpass Channel We now look at the 
lowpass channel as a composition of four linear operators we just saw: 

Xv = p vX = GU 2 D 2 G T x. (7.18) 

The notation is evocative of projection onto V, and we will now show that the 
lowpass channel accomplishes precisely this. Using ( 17.13) , we check idempotency 
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and self-adjointness of P (Definition 1.27) , 

P v = {GU 2 D 2 G T )(GU 2 D 2 G T ) ( = ] GU 2 D 2 G T = P v , 
1 

= G(U 2 D 2 ) T G T ( =' GU 2 D 2 G T = P v , 

where (a) follows from ( 17.13) and (b) from (2.190) . Indeed, Py is an orthogonal 
projection operator, with the range given in ( 17.10a) : 

V = span({g„_2fe}feez)- (7-19) 

The summary of properties of the lowpass channel is given in Table 7.11 

Orthogonal Projection Property of the Highpass Channel The highpass channel 
as a composition of four linear operators (infinite matrices) is: 

x w = P w x = HU 2 D 2 H T x. (7.20) 

It is no surprise that P\y is an orthogonal projection operator with the range given 
in ( f7T0bl ): 

W = span({/i„_2*;}fcez)- (7.21) 

The summary of properties of the highpass channel is given in Table 7.1 (table 
provided for lowpass channel, just make appropriate substitutions). 

7.2.2 Complementary Channels and Their Properties 

While we have discussed which properties each channel has to satisfy on its own, 
we now discuss what they have to satisfy with respect to each other to build an 
orthonormal basis. Intuitively, one channel has to keep what the other throws away; 
in other words, that channel should project to a subspace orthogonal to the range 
of the projection operator of the other. For example, given Py, Pw should project 
onto the leftover space between £ 2 (Z) and Py£ 2 (Z). 

Thus, we start by assuming our filter bank in Figure 7.1( a) implements an 
orthonormal basis, which means that the set of basis sequences {g n -2k, hn-2k}k& 
is an orthonormal set, compactly represented by ( 17. 8| ). We have already used the 
orthonormality of the set {g n -2k}kez in ( 17.13) as well as the orthonormality of the 
set {/i n -2fc}fcez m ( 17.14) . What is left is that these two sets are orthogonal to each 
other, the third equation in (7.8) . 

Orthogonality of the Lowpass and Highpass Filters Using similar methods as 
before, we summarize the lowpass and highpass sequences must satisfy: 

Matrix View 



D 2 H T GU 2 = 

(9n,h n -2k) - «3T» G{z)H(z- 1 ) + G(-z)H(-z- 1 ) = 

G{e ju >)H{e-'J bJ ) + G{e^ UJ+ ^)H{e-i^ + ^) = 

(7.22) 



DTFT 
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X{en\ 





Figure 7.5: Two-channel decomposition of a sequence using ideal filters. Left side de- 
picts the process in the lowpass channel, while the right side depicts the process in the 
highpass channel, (a) Original spectrum, (b) Spectra after filtering, (c) Spectra after 
downsampling. (d) Spectra after upsampling. (e) Spectra after interpolation filtering, (f) 
Reconstructed spectrum. 



Deterministic Crosscorrelation of the Lowpass and Highpass Filters Instead of 
the deterministic autocorrelation properties of an orthogonal filter, we look at the 
deterministic crosscorrelation properties of two filters orthogonal to each other: 



(ffni h n -2k) = c 2k 



Matrix View 

ZT\ 

DTFT 



C(e 



]U 



D 2 CU 2 

C(z) + C(-z) 

) + C( e J(w+7T)) 



(7.23) 



In the above, C = H T G is the deterministic crosscorrelation operator, and the 
deterministic crosscorrelation is given by ( 12.99) . In particular, all the even terms 
of c are equal to zero. 

7.2.3 Orthogonal Two-Channel Filter Bank 

We are now ready to put together everything we have developed so far. We have 
shown that the sequences {g n ~2k, ^n-2fe}feez form an orthonormal set. What is left 
to show is completeness: any sequence from £ 2 (Z) can be represented using the 
orthonormal basis built by our orthogonal two-channel filter bank. To do this, we 
must be more specific, that is, we must have an explicit form of the filters involved. 
In essence, we start with an educated guess (and it will turn out to be unique, 
Theorem 17.2) , inspired by what we have seen in the Haar case. We can also 
strengthen our intuition by considering a two-channel filter bank with ideal filters 
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as in Figure [7T5l If we are given an orthogonal lowpass filter g, can we say anything 
about an appropriate orthogonal highpass filter h such that the two together build 
an orthonormal basis? A good approach would be to shift the spectrum of the low- 
pass filter by n, leading to the highpass filter. In time domain, this is equivalent to 
multiplying g n by (—1)™. Because of the orthogonality of the lowpass and highpass 
filters, we also reverse the impulse response of g. We will then need to shift the 
filter to make it causal again. Based on this discussion, we now show how, given 
an orthogonal filter g, it completely specifies an orthogonal two-channel filter bank 
implementing an orthonormal basis for ^ 2 (Z): 

Theorem 7.1 (Orthogonal two-channel filter bank) Given is an FIR 
filter g of even length L = 21, I € Z + , orthonormal to its even shifts as in 
( 17331 ). Choose 

h n = ±(-l)" 9 _ n+L _ 1 ^ H(z) = Tz-^Gi-z- 1 ). (7.24) 

Then, {g n -2k, ^-n-2fc}fcgz is an orthonormal basis for ^ 2 (Z), implemented by an 
orthogonal filter bank specified by analysis filters {g- n , h-n} and synthesis filters 
{gn, hn}. The expansion splits £ 2 (Z) as 

£ 2 (Z) = V(BW, with I = ^i{9n-2kh^), (?25) 

W = span({/i„_ 2 fc}feez)- 



Proof. To prove the theorem, we must prove that |(i)|{g n _2fe, /i n -2fc}feez is an orthonor- 
mal set and |(ii)| it is complete. The sign ± in ( 17.241) just changes phase; assuming 
G(l) = \/2) if the sign is positive, H(l) — y/2, and if the sign is negative, -ff(l) — — \/2. 
Most of the time we will implicitly assume the sign to be positive; the proof of the 
theorem does not change in either case. 

(i) To prove that {g n -2k,h, n -2k}kEE is an orthonormal set, we must prove ( 17.81) . 
The first condition is satisfied by assumption. To prove the second, that is, h is 
orthogonal to its even shifts, we must prove one of the conditions in (17.141) . The 
definition of h in ( 17.241) implies 

H(z)H( z - 1 ) = G{-z)G{-z- 1 ), (7.26) 

and thus, 

H(z)H( z - 1 ) + H(- z - 1 )H{-z- 1 ) = G(-z)G(- Z - 1 ) + G( Z )G(z- 1 ) ( = 5 2, 



where (a) follows from (7.131) . 

To prove the third condition in (7.81) , that is, h is orthogonal to g and all its 
even shifts, we must prove one of the conditions in (7.221) : 

G(z)H( z - 1 ) + G{- z )H{-z~ l ) ^ - z L - 1 G{z)G(- z ) + (-l) L 2 I - 1 G(-z)G(2) 

= - z L - 1 G(z)G(- z ) + z L - 1 G( z )G{-z) =' 0, 



where (a) follows from (7.241) ; and (b) from L — 2t even. 
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(ii) To prove completeness, we prove that perfect reconstruction holds for any x £ 
£ 2 (Z) (an alternative would be to prove Parseval's equality ||:r|| 2 = ||^v|| 2 + 
||xw/|| 2 ). What we do is find z-domain expressions for Xv(z) and Xw{z) and 
prove they sum up to X{z). We start with the lowpass branch. In the lowpass 
channel, the input X(z) is filtered by G(z~ ), and is then down- and upsampled, 
followed by filtering with G(z) (and similarly for the highpass channel). Thus, 
the z-transforms of xv and xw are: 

X v {z) = \G(z) [G(z- l )X(z) + G(-z- l )X(-z)} , (7.27a) 

X w {z) = \H(z) [Hiz'^Xiz) + Hi-z'^Xi-z)] . (7.27b) 

The output of the filter bank is the sum of xv and xw'- 

Xv{z) + X w {z) = 1 [G(z)G(-z- 1 ) + H{z)H(-z- 1 )] X(-z) 

S(z) 

+ \ [G(z)G(z~ 1 ) + H(z)H(z- 1 )] X(z). (7.28) 

T(z) 



Substituting ( 17.241) into the above equation, we get: 
S(z) = G{z)G{-z~ 1 ) + H(z)H{-z- 1 ) 

( = ) G(z)G(-z- 1 )+ [-z^+'Gt-r 1 )] [-(- Z " 1 )- L+1 G( Z ) 

= [i + (-1)- l+1 ]g(z)G(-z- 1 ) ( =' 0, (7.29a) 

T(z) = G{z)G{z~ 1 ) + H{z)H{z- 1 ) 

= G(z)G(z- 1 ) + G(-z- 1 )G(-z) ^ 2, (7.29b) 

where (a) follows from (17.241) ; (b) from L = 2£ is even; (c) from ( 17.26|) ; and (d) 
from ( 17.13) . Substituting this back into (7.28) , we get 

Xv(z)+X w (z) = X{z), (7.30) 

proving perfect reconstruction, or, in other words, the assertion in the theorem 
statement that the expansion can be implemented by an orthogonal filter bank. 
To show ( J7.25) , we write (17.30J ) in the original domain as in (7.11) : 

Xn - ^«fc3n-2fe + } J Pkh n -2k, (7-31) 



showing that any sequence x (z £ (Z) can be written as a sum of its projections 
onto two subspaces V and W, and these subspaces add up to £ 2 (Z). V and W 
are orthogonal from (17.221) proving (7.25) . 

In the theorem, L is an even integer, which is a requirement for FIR niters of lengths 
greater than 1 (see Exercise 17.2) . Moreover, the choice ( 17.24) is unique; this will 
be shown in Theorem 17.21 Table 17.9 summarizes various properties of orthogonal, 
two-channel filter banks we covered until now. 
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Along with the time reversal and shift, the other qualitative feature of Q7.24D 
is modulation by e jn ^ = (—1)™ (mapping z — ► — z in the z domain, see ( 12.136) ). As 
we said, this makes h a highpass filter when g is a lowpass filter. As an example, 
if we apply Theorem 7.1 to the Haar lowpass filter from ( 17.1a) , we obtain the Haar 
highpass filter from ( 17. lb) . 

In applications, filters are causal. To implement a filter bank with causal 
filters, we make analysis filters causal (we already assumed the synthesis ones are) 
by shifting them both by (— L + 1). Beware that such an implementation implies 
perfect reconstruction within a shift (delay), and the orthonormal basis expansion 
is not technically valid anymore. However, in applications this is often done, as the 
output sequence is a perfect replica of the input one, within a shift: x n = x n -L+i- 

7.2.4 Polyphase View of Orthogonal Filter Banks 

As we saw in Section 12.71 downsampling introduces periodic shift variance into the 
system. To deal with this, we often analyze multirate systems in polyphase domain, 
as discussed in Section [2.7.61 The net result is that the analysis of a single-input, 
single-output, periodically shift-varying system is equivalent to the analysis of a 
multiple- input, multiple-output, shift-invariant system. 

Polyphase Representation of an Input Sequence For two-channel filter banks, 
a polyphase decomposition of the input sequence is achieved by simply splitting it 
into its even- and odd-indexed subsequences as in ( J2.210J ) , the main idea being that 
the sequence can be recovered from the two subsequences by upsampling, shifting 
and summing up, as we have seen in Figure 12.231 This simple process is called a 
polyphase transform (forward and inverse). 

Polyphase Representation of a Synthesis Filter Bank To define the polyphase 
decomposition of the synthesis filters, we use the expressions for upsampling followed 
by filtering from (2.215) : 

9o,n = 9lr, 



ZT 



9l,n — 92n+l 



ZT 



G o{ z ) = ^2,92nZ ™, 


(7.32a) 


n£Z 




G l( Z ) = X^2n+1^~", 


(7.32b) 


n£Z 




G{z) = G (z 2 ) + ^~ 1 Gi(z 2 ), 


(7.32c) 



where we split each synthesis filter into its even and odd subsequence as we have 
done for the input sequence x. Analogous relations hold for the highpass filter h. 
We can now define a polyphase matrix $ p (z): 



%{z) 



G a (z) H (z) 
G x (z) H x (z) 



(7.33) 



As we will see in ( 17.37) , such a matrix allows for a compact representation, analysis 
and computing projections in the polyphase domain. 
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Polyphase Representation of an Analysis Filter Bank The matrix in ( 17. 33) is on 

the synthesis side; to get it on the analysis side, we can use the fact that this is an 
orthogonal filter bank. Thus, we can write 

G[z) = Giz- 1 ) = Goiz-^ + zdiz- 2 ). 

In other words, the polyphase components of the analysis filter are, not surprisingly, 
time-reversed versions of the polyphase components of the synthesis filter. We can 
summarize this as (we could have obtained the same result using the expression for 
polyphase representation of filtering followed by downsampling ( ] 2 . 2 2 1 [ ) ) : 



90,n — 92n — 9~2n 



<?!,« — <?2n-l — 9-2n+l 



ZT 



ZT 



G (z) = ^g_ 2 „z 

ra£Z 

G l( Z ) = X^- 9 - 2n + i; 



(7.34a) 
(7.34b) 



nes 



G(z) = G {z- 2 ) + zGiiz- 2 ), (7.34c) 

with analogous relations for the highpass filter h, yielding the expression for the 
analysis polyphase matrix 



%{*) 



G (z) H (z) 
Gi(z) H-l(z) 



Goiz- 1 ) 
G^z- 1 ) 



Hoiz- 1 ] 
Hiiz- 1 ] 



Qpiz' 1 ). (7.35) 



A block diagram of the polyphase implementation of the system is given in Fig- 
ure 17.61 The left part shows the reconstruction of the original sequence using the 
synthesis polyphase matrix ] 107 | The right part shows the computation of expansion 
coefficient sequences a and 0; note that as usual, the analysis matrix (polyphase in 
this case) is taken as a transpose, as it operates on the input sequence. To check 
that, compute these expansion coefficient sequences: 



a(z) 



Tf„-l\ 



%(z 



X (z) 
X 1 (z) 



Goiz' 1 ) G^z' 1 ) 
H (z- 1 ) H x {z- X ) 



X (z) 
Xi(z) 



(7.36) 



Goiz-^Xoiz) + G 1 (z- 1 )X 1 (z)' 
H {z- l )X Q {z) + H^z-^X^z) 



We can obtain exactly the same expressions if we substitute ( 17.34) into the expres- 
sion for filtering followed by downsampling by 2 in ( |2.196a) . 

Polyphase Representation of an Orthogonal Filter Bank The above polyphase 
expressions allow us now to compactly represent an orthogonal two-channel filter 
bank in the polyphase domain: 



X(z) = [1 z- 1 ] %{z 2 )^> T p (z 



T („-2\ 
P 



X (z 2 ) 
X 1 {z 2 ) 



(7.37) 



107 A comment is in order: we typically put the lowpass filter in the lower branch, but in matrices 
it appears in the first row/column, leading to a slight inconsistency when the filter bank is depicted 
in the polyphase domain. 
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Figure 7.6: Polyphase representation of a two-channel orthogonal filter bank. 



From ( I7.24J) , we get that the polyphase components of H are 

H a {z) = ±z~ L l 2+1 G l {z' 1 ), 
H 1 (z) = tz- l/2+1 G (z- 1 ), 

leading to the polyphase matrix 



* P (*) 



G {z) ±z- L / 2+l G l {z- 1 ) 
Gi(z) tz- l/2+1 G (z- v ) 



(7.38a) 
(7.38b) 



(7.39) 



Since g is orthogonal to its even translates, substitute (7.32) into the z-domain 
version of ( 17.13) to get the condition for orthogonality of a filter in polyphase form: 

Go(z)G (^ 1 ) + G 1 (z)G 1 (z- 1 ) = 1. (7.40) 

Using this, the determinant of & p (z) becomes — z~ L > 2+1 . From ( 17.37) , the polyphase 
matrix $ p (z) satisfies the following: 



^ p (z)^(z- 1 ) = I, 



(7.41) 



a paraunitary matrix as in ( |2.294a) . In fact, (7.39) , together with (7.40) , define the 
most general 2x2, real-coefficient, causal FIR lossless matrix, a fact we summarize 
in form of a theorem, the proof of which can be found in [162] : 



Theorem 7.2 (General form of a paraunitary matrix) The most gen- 
eral 2x2, real-coefficient, causal FIR lossless matrix is given by ( 17.39) , where 
G and Gi satisfy ( 17T40] ) and L/2 - 1 is the order of G (z), Gi(z). 



Example 7.1 (Haar filter bank in polyphase form) The Haar filters (fTTTT) 
are extremely simple in polyphase form: Since they are both of length 2, their 
polyphase components are of length 1. The polyphase matrix is simply 



%(z) 



1 

V2 



(7.42) 



The form of the polyphase matrix for the Haar orthonormal basis is exactly the 
same as the Haar orthonormal basis for M 2 , or one block of the Haar orthonormal 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



7.2. Orthogonal Two-Channel Filter Banks 



583 



basis infinite matrix $ from ( ]I.7[ ) . This is true only when a filter bank implements 
the so-called block transform, that is, when the nonzero support of the basis 
sequences is equal to the sampling factor, 2 in this case. 

The polyphase notation and the associated matrices are powerful tools to 
derive filter bank results. We now rephrase what it means for a filter bank to be 
orthogonal — implement an orthonormal basis, in polyphase terms. 



Theorem 7.3 (Paraunitary polyphase matrix and orthonormal basis) 
A 2 x 2 polyphase matrix $ p (z) is paraunitary if and only if the associated 
two-channel filter bank implements an orthonormal basis for £ 2 (Z). 



Proof. If the polyphase matrix is paraunitary, then the expansion it implements is 
complete, due to ( 17.411) . To prove that the expansion is an orthonormal basis, we must 
show that the basis sequences form an orthonormal set. From (|7.39J ) and (|7.41| ), we get 
(17.40} . Substituting this into the z-domain version of ( ]7.13[) , we see that it holds, and 
thus g and its even shifts form an orthonormal set. Because h is given in terms of g as 
(|7.24[ ), h and its even shifts form an orthonormal set as well. Finally, because of the 
way h is defined, g and h are orthogonal by definition and so are their even shifts. 

The argument in the other direction is similar; we start with an orthonormal basis 
implemented by a two-channel filter bank. That means we have template sequences g 
and h related via (17.241) , and their even shifts, all together forming an orthonormal 
basis. We can now translate those conditions into ^-transform domain using (|7.13[ ) and 
derive the corresponding polyphase-domain versions, such as the one in (|7.40[ ). These 
lead to the polyphase matrix being paraunitary. 

We have seen in Chapter \2\ that we can characterize vector sequences us- 
ing deterministic autocorrelation matrices (see Table |2.13fl . We use this now to 
describe the deterministic autocorrelation of a vector sequence of expansion coeffi- 
cients [a n /?„] , as 

Ap t u\Z) '- 



A a (z) C a jj(z) 
C f j, a {z) A fj (z) 

a{z) 



a(z) a(z 1 ) a(z) (3(z ) 
P(z)a{z- 1 ) f3(z)p(z- 1 ) 



./?(*) 



[aC*" 1 ) [3(z-i)} 



(a) 



K^ 1 ) 



Xo(z) 
Xi(z\ 



[X^z- 1 ) Xoiz- 1 )} $ p (z) 



$>l{z- x )A p , x {z)%{z), 



(7.43) 



where (a) follows from (7.36) , and A PiX is the deterministic autocorrelation matrix 
of the vector of polyphase components of x. This deterministic autocorrelation 
matrix can be seen as a filtered deterministic autocorrelation of the input. We now 
have the following result: 
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Theorem 7.4 (Filtered deterministic autocorrelation matrix) Given 
is a 2 x 2 paraunitary polyphase matrix <f> p (e JU> ). Then the filtered deterministic 
autocorrelation matrix, A p _ a (e JtAj ), is positive semidefinite. 



Proof. Since Q P (z) is paraunitary, <J? p (e-'") is unitary on the unit circle. This further 
means that: 

[cos0 sin 8] $p(e~ juJ ) — [cos(f> sm(f>] , (7.44) 

for some d>. We can now write: 



[cos 9 sin 8] A Pta (e juJ ) 



cost 
sin£ 



cos# 
sin 9 



=' [cosS sin^j <t> 1 p {e- 3 ")A p , x {e?")<i> p (e>") 
— [cos4> s\n<f\ A PyX (e ju ') 
where (a) follows from ( |7.43| ): (b) from (I7.44J ); and (c) from TBD, proving the theorem. 



cos< 
sin c 



> 0, 



7.2.5 Polynomial Approximation by Filter Banks 

An important class of orthogonal filter banks are those that have polynomial ap- 
proximation properties; these filter banks will approximate polynomials of a certain 
degre e 108 ! in the lowpass (coarse) branch, while, at the same time, blocking those 
same polynomials in the highpass (detail) branch. To derive these filter banks, we 
recall what we have learned in Section 2.B.1J Convolution of a polynomial sequence 
x with a differencing filter (8 n — <5„_i), or, multiplication of X(z) by (1 — z _1 ), 
reduces the degree of the polynomial by 1. In general, to block a polynomial of 



aJV-1 



degree (N — 1), x n = Ylk=Q a k n > we need a filter of the form: 

(l-z-TR'(z). 



(7.45) 



Let us now apply what we just learned to two-channel orthogonal filter banks with 
polynomial sequences as inputs. We will construct the analysis filter in the highpass 
branch to have N zeros at z = 1, thus blocking polynomials of degree up to (N — 1). 
Of course, since the filter bank is perfect reconstruction, whatever disappeared in the 
highpass branch must be preserved in the lowpass one; thus, the lowpass branch will 
reconstruct polynomials of degree (N — 1). In other words, xy will be a polynomial 
approximation of the input sequence a certain degree. 

To construct such a filter bank, we start with the analysis highpass filter h 
which must be of the form ( 17.45) ; we write it as: 



(o) 



H(z) '=' (l-z-'fTz^Ri-z) = Tz^il-z-YRi-z) = TZ L - X G{-z) 



W ^JL- 



R'(z) 



108 We restrict our attention to finitely-supported polynomial sequences, ignoring the boundary 
issues. If this were not the case, these sequences would not belong to any £ p space. 
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where in (a) we have chosen R'(z) to lead to a simple form of G(z) in what follows; 
and (b) follows from Table [7T9l allowing us to directly read the synthesis lowpass as 

G{z) = {l + z'^Riz). (7.46) 

If we maintain the convention that g is causal and of length L, then R(z) is a poly- 
nomial in z _1 of degree (L — l — N). Of course, R(z) has to be chosen appropriately, 
so as to obtain an orthogonal filter bank. 

Putting at least one zero at z = — 1 in G{z) makes a lot of signal processing 
sense. After all, z = — 1 corresponds to u> = it, the maximum discrete frequency; it 
is thus natural for a lowpass filter to have a zero at z = — 1 and block that highest 
frequency. Putting more than one zero at z = — 1 has further approximation advan- 
tages, as the Proposition 17.5 specifies, and as we will see in wavelet constructions 
in later chapters. 

Theorem 7.5 (Polynomial reproduction) Given is an orthogonal filter 
bank in which the synthesis lowpass filter G(z) has TV zeros at z = —1. Then 
polynomial sequences up to degree (N — 1) and of finite support are reproduced 
in the lowpass approximation subspace spanned by {g n -2k}kez- 



Proof. By assumption, the synthesis filter G(z) is given by ( |7.46j) . From Table 7.91 
the analysis highpass filter is of the form =fz L ~ 1 G(— z), which means it has a factor 
(1 — z _1 ) N , that is, it has N zeros at z — 1. From our discussion, this factor annihilates 
a polynomial input of degree (N — 1), resulting in (3 = and xw = 0. Because of the 
perfect reconstruction property, x = xv , showing that the polynomial sequences are 
reproduced by a linear combination of {g n -2k}k£i., as in fl7.ll.aj) . 

Polynomial reproduction by the lowpass channel and polynomial cancellation 
in the highpass channel are basic features in wavelet approximations. In particular, 
the cancellation of polynomials of degree (N — 1) is also called the zero-moment 
property of the filter (see ( 12.139a) ) : 

m k = J2n k h n = 0, fc = 0,l,...,7V-l, (7.47) 

that is, fcth-order moments of h up to (N — 1) are zero (see Exercise 17.6) . 

7.3 Design of Orthogonal Two-Channel Filter Banks 

To design a two-channel orthogonal filter bank, it suffices to design one orthogonal 
filter — the lowpass synthesis g with the z-transform G{z) satisfying ( 17.13) ; we have 
seen how the other three filters follow (Table 7.9) . The design is based on (1) 
finding a deterministic autocorrelation function satisfying ( J7.15) (it is symmetric, 
positive semi-definite and has a single nonzero even-indexed coefficient; and (2) 
factoring that deterministic autocorrelation A(z) = G{z )G{ z~ l ) into its spectral 
factors (many factorizations are possible, see Section [2. 5)J 109 I 

109 Recall that we only consider real-coefficient filters, thus a is symmetric and not Hermitian 
symmetric. 
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We consider three different designs. The first tries to approach an ideal half- 
band lowpass filter, the second aims at polynomial approximation, while the third 
uses lattice factorization in polyphase domain. 

7.3.1 Lowpass Approximation Design 

Assume we wish to get our lowpass synthesis filter G(e JUJ ) to be as close as pos- 
sible to an ideal lowpass halfband filter as in TBD. Since according to ( 12.96) 
the deterministic autocorrelation of g can be expressed in the DTFT domain as 
A{eJ UJ ) = G^'") , this deterministic autocorrelation is an ideal lowpass halfband 
function as well: 

A(en = I 2 > if M</A ( 7 .48) 

^ (J, otherwise. 

From Table 12.51 the deterministic autocorrelation sequence is 

1 /-f/2 

2e jnuJ duj = sinc(mr/2), (7.49) 



27r J-w/2 



a valid deterministic autocorrelation; it has a single nonzero even-indexed coeffi- 
cient (arj = 1) and is positive semi-definite. To get a realizable function, we apply 
a symmetric window function w that decays to zero. The new deterministic auto- 
correlation a' is the pointwise product 

a' n = a n w n . (7.50) 

Clearly, a' is symmetric and still has a single nonzero even-indexed coefficient. 
However, this is not enough for a' to be a deterministic autocorrelation. We can 
see this in frequency domain, 

A'{e> u ) = ^-A(e ju )*W '(e*"), (7.51) 

where we used the convolution in frequency property ( 12.94) . In general, ( 17.51) 
is not nonnegative for all frequencies anymore, and thus not a valid deterministic 
autocorrelation. One easy way to enforce nonnegativity is to choose W(e Jul ) itself 
positive, for example as the deterministic autocorrelation of another window w' , or 

W(e ju ) = \W'{e 3UJ )\ 2 . 

If w' is of norm 1, then wq = I, and from ( 17.50) , a = 1 as well. Therefore, since 
A(e JW ) is real and positive, A'(e JUJ ) will be as well. The resulting sequence a' and 
its z-transform A' (z) can then be used in spectral factorization (see Section [2.5.3) 
to obtain an orthogonal filter g. 

Example 7.2 (Lowpass approximation design of orthogonal filters) 
We design a length-4 filter by the lowpass approximation method. Its determin- 
istic autocorrelation is of length 7 with the target impulse response obtained by 
evaluating ( 17.49) : 



U 3l " 7T Lil T " 37T U 
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0.4- 
0.3- 
fl? - 



Figure 7.7: Orthogonal filter design based on lowpass approximation in Example 7.21 
(a) Impulse response, (b) Frequency response. 



For the window w, we take it to be the deterministic autocorrelation of the 
sequence w' n , which is specified by w' n = 1/2 for < n < 3, and w' n = 
otherwise: 



ir 



i i 2 

u u 4 2 4 



m 



2 I I 

4 2 4 u u 



Using ( 17.50P , we obtain the new deterministic autocorrelation of the lowpass filter 







6-7T 







2jt LLI 2ir " 



6tt 







Factoring this deterministic autocorrelation (requires numerical polynomial root 
finding) gives 







0.832 



0.549 0.0421 -0.0637 



The impulse response and frequency response of g are shown in Figure 7.71 

The method presented is very simple, and does not lead to the best designs. 
For better designs, one uses standard filter design procedures followed by adjust- 
ments to ensure positivity. For example, consider ( |7.51f ) again, and define 



min A'{e JU> ) 

UJG [— vT,7r] 



£. 



If e > 0, we are done, otherwise, we simply choose a new function 

A"{e lu ) = A'{e ju )-e, 

which is now nonnegative, allowing us to perform spectral factorization. Filters 
designed using this method are tabulated in [160]. 
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7.3.2 Polynomial Approximation Design 



Recall that a lowpass filter G(z) with N zeros at z = — 1 as in (7.46) reproduces 
polynomials up to degree (N — 1). Thus, the goal of this design procedure is to find 
a deterministic autocorrelation A(z) of the form 

A(z) = G(z)G(z- 1 ) = (l + z- 1 ) A '(l + z) N Q{z), (7.52) 

with Q(z) chosen such that ( 17. 15} ) is satisfied, that is, 

A(z)+A(-z) = 2, (7.53) 

Q(z) = Q(z~~ l ) (q n symmetric in time domain), and Q(z) is nonnegative on the 
unit circle. Satisfying these conditions allows one to find a spectral factor of A{z) 
with N zeros at z = — 1, and this spectral factor is the desired orthogonal filter. 
We illustrate this procedure through an example. 

Example 7.3 (Polynomial approximation design of orthogonal filters) 
We will design a filter g such that it reproduces linear polynomials, that is, N = 2: 

A(z) = {l + z~ l ) 2 {l + z) 2 Q{z) = {z" 2 + Az' l + & + 4z + z 2 )Q{z). 

Can we now find Q(z) so as to satisfy ( 17.53) , in particular, a minimum-degree 
solution? We try with (remember q n is symmetric) 

Q(z) = az + b + az~ 

and compute A(z) as 

A(z) = a(z 3 + z- 3 ) + (4a + b){z 2 + z~ 2 ) + {7 a + Ab){z + z^ 1 ) + {8a + 6b). 

To satisfy ( 17.53) , A(z) must have a single nonzero even-indexed coefficient. We 
thus need to solve the following pair of equations: 

4a + 6 = 0, 
8a + 66= 1, 

yielding a = —1/16 and 6 = 1/4. Thus, our candidate factor is 

Q{z) = \ {-\z-' + l-\z 
It remains to check whether Q(e JUJ ) is nonnegative: 

Q(en = l(i-l( < J u + e-*<)) = J(l-|cos W ) > 

since | cos(w)| < 1. So Q(z) is a valid deterministic autocorrelation and can be 
written as Q(z) = R(z)R(z^ 1 ). Extracting its causal spectral factor 

R{z) = ^=(l + V3 + (l-V3)^ 1 ), 
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(a) 
(c) 




Figure 7.8: Orthogonal filter design based on polynomial approximation in Example 17.31 
(a) fmpulse response, (b) Frequency response, (c) Linear x is preserved in V. (d) Only 
the linear portion of the quadratic x is preserved in V; the rest shows in W. 



the causal orthogonal lowpass filter with 2 zeros at Z = — 1 becomes 

(1 

-p [(1 + V$) + (3 + v/3)z _1 + (3 - v/3>~ 2 + (1 - \/3>~ 3 ] 



G{z) = (l + z^fRiz) 



4%/2 

This filter is one of the filters from the Daubechies family of orthogonal filters, 
fts impulse and frequency responses are shown in Figure 7.81 The rest of the 
filters in the filter bank can be found from Table 17.91 

In the example, we saw that solving a linear system followed by spectral fac- 
torization were the key steps. In general, for G(z) with N zeros at z = — 1, the 
minimum-degree R(z) to obtain an orthogonal filter is of degree (N — 1), corre- 
sponding to N unknown coefficients. Q(z) = R(z)R(z~ 1 ) is obtained by solving an 
N x N linear system (to satisfy A(z) + A(z) = 2), followed by spectral factorization 
to produce the desired result. (It can be verified that Q(e J ") > 0.) These steps are 
summarized in Table P7T21 while Table P7731 gives filter-design examples. 

Note that A(z) has the following form when evaluated on the unit circle: 



A(e Jul ) = 2 iv (l + cos^) iv Q(e J "), 

with Q(e JUJ ) real and positive. Since A(e 3iAj ) and its (27V— 1) derivatives are zero at 
U) = n, \G(e JUJ )\ and its (N — I) derivatives are zero at ui = tt. Moreover, because 
of the quadrature mirror formula (2.2081 ), |G(e- ?w )| and its (N — 1) derivatives are 
zero at to = as well. These facts are the topic of Exercise 17.81 
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Step Operation 



1. Choose N, the number of zeros at z = — 1 

2. G(z) = (1 + z _1 ) N R(z), where R(z) is causal with powers 
(0, -1, ..., -N + l) 

3. A(z) = (1 + 2~ 1 ) JV (1 + z) N Q(z), where Q(z) is symmetric and 
has powers (-(AT - 1), . . . , 0, . . . , (N + 1)) 

4. A(z) + A(—z) = 2. This leads to N linear constraints on 
the coefficients of Q{z) 

5. Solve the N X N linear system for the coefficients of Q{z) 

6. Take the spectral factor of Q(z) = R(z)R(z~ ) 

(for example, the minimum-phase factor, see Section 2.5| ) 

7. The minimum phase orthogonal filter is G(z) = (1 + z~ ) R{z) 



Table 7.2: Design of orthogonal lowpass filters with maximum number of zeros at z = — 1. 





L = 4 


L = 6 


L = 8 


L = 10 


L = 12 


So 


0.482962913 


0.332670553 


0.230377813309 


0.160102398 


0.111540743350 


Si 


0.836516304 


0.806891509 


0.714846570553 


0.603829270 


0.494623890398 


92 


0.224143868 


0.459877502 


0.630880767930 


0.724308528 


0.751133908021 


S3 


-0.129409522 


-0.135011020 


-0.027983769417 


0.138428146 


0.315250351709 


34 




-0.085441274 


-0.187034811719 


-0.242294887 


-0.226264693965 


95 




0.035226292 


0.030841381836 


-0.032244870 


-0.129766867567 


36 






0.032883011667 


0.077571494 


0.097501605587 


97 






-0.010597401785 


-0.006241490 


0.027522865530 


98 








-0.012580752 


-0.031582039318 


99 








0.003335725 


0.000553842201 


910 










0.004777257511 


911 










-0.001077301085 



Table 7.3: Orthogonal filters with maximum number of zeros at z — 
a lowpass filter of even length L — 21, there are L/2 zeros at z — — 1. 



-1 (from [39]). For 



7.3.3 Lattice Factorization Design 

When discussing the polyphase view of filter banks in Section 7.2.41 we saw that 
orthogonality of a two-channel filter bank is connected to its polyphase matrix being 
paraunitary. The following elegant factorization result is used in the design of that 
paraunitary matrix: 



Theorem 7.6 The polyphase matrix of any real-coefficient, causal, FIR orthog- 
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Rk-i 



Bi- 



ll 



Figure 7.9: Two-channel lattice factorization of paraunitary filter banks. The 2x2 blocks 
Rk are rotation matrices, and U is a general unitary matrix (rotation or rotoinversion) . 
The inputs are the polyphase components of the sequence x, and the output are the lowpass 
and highpass channels. 



onal two-channel filter bank can be written as 

K-l 



%(z) 



fc=i 



1 

z' 1 



Ri 



(7.54) 



where U is a general unitary matrix as in ( J1.218J ) (either a rotation as in ( jl.220a) 
or a rotoinversion ( ]1.220b) ), and Rk, k = 1, 2, . . . , K — 1, are rotation matrices 
as in ( |1.220a| ). 



The resulting filters are of even length 2K (see Exercise 17.2]) . That the above 
structure produces and orthogonal filter bank is clear as the corresponding poly- 
phase matrix <& p (z) is paraunitary. Proving that any orthogonal filter bank can be 
written in the form of ( 17.54) is a bit more involved. It is based on the result that for 
two, real-coefficient polynomials Frt-1 all d Qk-1 of degree (K — 1), with Pk-i{0) 
Pk-i{K — 1) ^ (and Pk-i, Qk-1 are power complementary as in (2.208) ), there 
exists another pair Pk-2, Qk-i such that 



Pk-x{z) 
Qk-i(z) 



cos( 
sin^ 



- sin 9 
cos 9 



Pk-2(z) 

J~ Qk~2 



(4 



(7.55) 



Repeatedly applying the above result to (7.39) one obtains the lattice factorization 
given in (7.6) . The details of the proof are given in (160] , 

Using the factored form, designing an orthogonal filter bank amounts to choos- 
ing U and a set of angles (6o, 9\, ■ ■ ■ , 6k—i)- For example, the Haar filter bank in 
lattice form amounts to keeping only the constant-matrix term, (7, as in (7.42) , a ro- 
toinversion. The factored form also suggests a structure, called a lattice, convenient 
for hardware implementations (see Figure 17.9) . 

How do we impose particular properties, such as zeros at ui = it, or, z = — 1, 
for the lowpass filter G{z)l Write the following set of equations: 



(«) 



G(z)\ z=1 = (G (z 2 ) + z-^iz 2 



G(z)\ z 



0) 



(Go(z 2 ) + ^ 1 G 1 (^ 2 ))| j 



G (l) + Gi(l) = V2, (7.56a) 
= G (l)-Gi(l) = 0, (7.56b) 
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where (a) and (c) follow from ( 17.32) ; (b) from ( 17.13I ) and the requirement that G(z) 
be at z = — 1, and similarly for (d). We can rewrite these compactly as: 

(7.57) 



"1 1 
1 -1 




'Go(l)" 

Go(l). 


= 


"72" 




The vector G above is just the first column of ^(z 2 )] _ , , which, in turn, is either 
a product of (1) K rotations by (0q, 9%, . . . , 0k— 1), or, (2) one rotoinversion by 0q 
and K — 1 rotations (#1, . . . , 0k-i)- The solution to the above is: 

K-l 

2^ 0k = 2mr + —, U is a rotation, (7.58a) 

k=a 

K-l 

00 — 2^ Ok = 2mr + —, U is a rotoinversion, (7.58b) 

fc=l 

for some neZ, Imposing higher-order zeros at z = — 1, as required for higher-order 
polynomial reproduction, leads to more complicated algebraic constraints. As an 
example, choosing 0$ = 7r/3 and 9\ = — 7r/12 leads to a double zero at z = — 1, and 
is thus the lattice version of the filter designed in Example 7.31 (see Exercise 7.3j ) . In 
general, design problems in lattice factored form are nonlinear and thus nontrivial. 

7.4 Biorthogonal Two-Channel Filter Banks 

While orthogonal filter banks have many attractive features, one eludes them: when 
restricted to real-coefficient, FIR filters, solutions that are both orthonormal and 
linear phase do not exist except for Haar filters. This is one of the key motivations 
for looking beyond the orthogonal case, as well as for the popularity of biorthog- 
onal filter banks, especially in image processing. Similarly to the orthogonal case, 
we want to find out how to implement biorthogonal bases using filter banks, in 
particular, those having certain time and frequency localization properties. From 
Definition 11.421 we know that a system {Lpk,tpk\ constitutes a pair of biorthogo- 
nal bases of the Hilbert space ^ 2 (Z), if (1) they satisfy biorthogonality constraints 
( E021 : 

(p fc , ft) = 4-, <- $$ T = $$ T = I, (7.59) 

where $ is an infinite matrix having cpk as its columns, while $ is an infinite matrix 
having cpk as its columns; and (2) it is complete: 

x = J2 X ^k = ®X = J2*kVk = ®X, (7.60) 

kei feez 

for all x G ^ 2 (Z), where 

X k = (ifk, x) <-> X = $ T x, and X k = (ip k , x) <-» X = <$> T x. 
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Figure 7.10: A biorthogonal two-channel analysis/synthesis filter bank. The output is 
the sum of the lowpass approximation xv and its highpass counterpart xw ■ 



It is not a stretch now to imagine that, similarly to the orthogonal case, we are 
looking for two template basis sequences — a lowpass/highpass pair g and h, and a 
dual pair g and h so that the biorthogonality constraints ( 17.591 ) are satisfied. Under 
the right circumstances described in this section, such a filter bank will compute a 
biorthogonal expansion. Assume that indeed, we are computing such an expansion. 
Start from the reconstructed output as in Figure 17.101 
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$X, (7.61) 



exactly the same as ( 17.7) . As in ( 17. 7j ), g n -2k and h n -2k are the impulse responses 
of the synthesis filters g and h shifted by 2k, and otk and /3k are the outputs of the 
analysis filter bank downsampled by 2. The basis sequences are the columns of 



$ 



Wk} 



A-e: 



{<f2ki f2k+l}keZ = {9n~2k, h n -2k}kel,, 



(7.62) 



that is, the even-indexed basis sequences are the impulse responses of the synthesis 
lowpass filter and its even shifts, while the odd-indexed basis sequences are the 
impulse responses of the synthesis highpass filter and its even shifts. 

So far, the analysis has been identical to that of orthogonal filter banks; we 
repeated it here for emphasis. Since we are implementing a biorthogonal expansion, 
the transform coefficients ak and (3k are inner products between the dual basis 
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sequences and the input sequence: <x% = (x, <p 2 k), 0k = ( x , <?2fc+i)- From ( 12. 61a) , 

a k = (x, tf2k) = (X n ,92k-n)n = ]jn-2k *k X <-> a = ^X, 

(3 k = (x, $2k.+l) = (Xn,h 2 k-n)n = h n -2k *k X <-> (3 = ®%X, 

that is, we can implement the computation of the expansion coefficients a k and 0k 
using convolutions, exactly as in the orthogonal case. We finally get 

X = $ T x. 

From above, we see that the dual basis sequences are 

$ = {^fcjfcez = {<?2fe, £>2fc+i}fcez = {92k-n, h 2 k-n}kez, (7.63) 

that is, the even-indexed dual basis sequences are the shift-reversed impulse re- 
sponses of the analysis lowpass filter and its even shifts, while the odd-indexed 
basis sequences are the shift-reversed impulse responses of the analysis highpass 
filter and its even shifts. 

We stress again that the basis sequences of $ are synthesis filters' impulse 
responses and their even shifts, while the basis sequences of $ are the shift-reversed 
analysis filters' impulse responses and their even shifts. This shift reversal comes 
from the fact that we are implementing our inner product using a convolution. Note 
also that $ and $ are completely interchangeable. 

As opposed to the three orthonormality relations ( 17. 8j h here we have four 
biorthogonality relations, visualized in Figure [7. Hi 

(9n, g2k-n)n = 5k, (7.64a) 

(hn, h 2 k-n)n = 5k, (7.64b) 

(h n , g 2 k~n)n = 0, (7.64c) 

(g n , h 2 k-n)n = 0. (7.64d) 

The purpose of this section is to explore the family of impulse responses {<?, h} 
and their duals {g, h} so as to satisfy the biorthogonality constraints. This family is 
much larger than the orthonormal family, and will contain symmetric/antisymmetric 
solutions, on which we will focus. 

7.4.1 A Single Channel and Its Properties 

As we have done for the orthogonal case, we first discuss channels in isolation and 
determine what they need to satisfy. Figure 17.121 shows the biorthogonal lowpass 
channel, projecting the input x onto its lowpass approximation xy. That lowpass 
approximation xy can be expressed identically to ( 17. 12a) : 

xy = y^afcffn-2/8- (7.65a) 

fcez 
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Figure 7.11: In a biorthogonal basis, g is orthogonal to h, and h is orthogonal to g. 
Then, lj and h are normalized so that the inner products with their duals are 1. 



The highpass channel follows the lowpass exactly, substituting h for g, h for g, and 
xw for xy (see Figure 7 . 1 2 D . The highpass approximation xw is 



x w 



fcez 



-Ik- 



(7.65b) 



Biorthogonality of the Lowpass Filters Since we started with a pair of biorthog- 
onal bases, {g n -2k,]]2k-n}k£Z satisfy biorthogonality relations ( 17. 64a) . Similarly to 
the orthogonal case, these can be expressed in various domains as: 



(g n , 92k- 



h 



Matrix View 

ZT 

DTFT 



D 2 GGU 2 = I 
G(z)G{z) + G{-z)G{-z) = 2 

G{e^)G{e^) + G(ei^ +7r '>)G{e3^+^) = 2 

(7.66) 
In the matrix view, we have used linear operators (infinite matrices) as we did for 
the orthogonal case; it expresses the fact that the columns of GVi are orthogonal 
to the rows of D 2 G. The z-transform expression is often the defining equation of a 
biorthogonal filter bank, where G(z) and G{z) are not causal in general. 



Biorthogonality of the Highpass Filters 



(hn, h 



2k—n/n 



6 k 



Matrix View 

ZT 

DTFT 



D 2 H HU 2 = I 

H{z)H{z) + H(-z)H(-z) = 2 
H{e ju} )H(e^) + H{e^ UJ+ ^)H{ei { - UJ+ ^) = 2 

(7.67) 



Deterministic Crosscorrelation of the Lowpass Filters In the orthogonal case, we 
rephrased relations as in ( 17.66) in terms of the deterministic autocorrelation of g\ 
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Lowpass Channel in a Two-Channel Biorthogonal Filter Bank 

Lowpass filters 

Original domain g n ,g n (Sn, S2fc-n}n = $k 

Matrix domain G, G D 2 GGU 2 = I 

z domain G(z),G(z) G{z)G{z) + G(-z)G(-z) = 2 

DTFT domain G{e?"),G{e?") G{e> w )G(e> w ) + G(e J '^ +,r ))G(e J ^ +7r ') 

Polyphase domain G(z) = G a (z 2 ) + z~ 1 G 1 (z 2 ) G (z)G (z) + Gi(.«)Gi(z) = 1 
G(z) = Go(z 2 ) + zGi{z 2 ) 

Deterministic crosscorrelation 

Original domain c„ = {gk,9~k+n}k 

Matrix domain C = GG D 2 CU 2 = -f 

z domain C(z) = G(z)G{z~ 1 ) C(z) + C(-z) = 2 

DTFT domain C{e? u ) = G(e JC ")G(e J "') G(e Jt ") + C{e jl -"+^) = 2 

Oblique projection onto smooth space V = span({g n _ 2 fc}fc e z) 

x v = P v x P V = GU 2 D 2 G 



Table 7.4: Properties of the lowpass channel in a biorthogonal two-channel hlter bank. 
Properties for the highpass channel are analogous. With g n = g~ n , or, G(z) = G(z~ 1 ) 
in the z-transform domain, the relations in this table reduce to those in Table 7. II for the 
orthogonal two-channel filter bank. 



here, as we have two sequences g and g, we express it in terms of the deterministic 



crosscorrelation of g and g, pT 



MatrixView ^ ^-, rr T 

< > L) 2 LjU 2 = 1 



{g n , 9ik-n)n = c 2k = 6 k ^> CO) + C(-z) = 2 (7.68) 

¥^? C(e^) + C(eJ^ +Tr )) = 2 

In the above, C = GG is a Toeplitz matrix with element c± k on the fcth diagonal 
left/right from the main diagonal (see ( J1.228D ). While this deterministic crosscor- 
relation will be used for design as in the orthogonal case, unlike in the orthogonal 
case: (1) C(z) does not have to be symmetric; (2) C(e JUJ ) does not have to be pos- 
itive; and (3) any factorization of C(z) leads to a valid solution, that is, the roots 
of C(z) can be arbitrarily assigned to G(z) and G(z). 



Deterministic Crosscorrelation of the Highpass Filters 



(h n ,h 2 k- n )n = c 2k = S k ^I> c(z) + C(-z) = 2 (7.69) 



itrix View 


D 2 CU 2 


= I 


ZT 
DTFT 


C(z) + C(-z) 


= 2 
= 2 
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Figure 7.12: The biorthogonal lowpass channel. 



Projection Property of the Lowpass Channel We now look at the lowpass channel 
as a composition of linear operators: 



x v = Pyx = GU 2 D 2 Gx. 
While Py is a projection, it is not an orthogonal projection: 
P V = (GU 2 D 2 G)(GU 2 D 2 G) = GU 2 D 2 G = Py, 



(7.70) 



P T 



(GU 2 D 2 G) 



T n T 



G i (U 2 D 2 YG 



G T U 2 D 2 G 1 / P v . 



Indeed, Py is a projection operator (it is idempotent), but it is not orthogonal (it 
is not self-adjoint). Its range is as in the orthogonal case: 



V = span({g„_ 2 fc}fce2 



(7.71) 



Note the interchangeable roles of g and g. When g is used in the synthesis, then xy 
lives in the above span, while if g is used, it lives in the span of {g n -2k}kei 1 - The 
summary of properties of the lowpass channel is given in Table 17.41 



Projection Property of the Highpass Channel The highpass projection operator 
P\y is: 

x w = P w x = HU 2 D 2 Hx; (7.72) 

again a projection operator (it is idempotent), but not orthogonal (it is not self- 
adjoint) the same way as for Py. Its range is: 



W = span({/i„„ 2 fe}fc6Z 



(7.73) 



7.4.2 Complementary Channels and Their Properties 

Following the path set during the analysis of orthogonal filter banks, we now discuss 
what the two channels have to satisfy with respect to each other to build a biorthog- 
onal filter bank. Given a pair of filters g and g satisfying ( 17.661 ) , how can we choose 
h and h to complete the biorthogonal filter bank and thus implement a biorthogo- 
nal basis expansion? The sets of basis and dual basis sequences {g n ~2k, hn-2k} kez 
and {g2k-ni h 2 k-n}keZ must satisfy ( 17.64) . We have already used ( 17.64a) in (7.66) 
and similarly for the highpass sequences in ( 17.67) . What is left to use is that these 
lowpass and highpass sequences are orthogonal to each other as in ( |7.64c) -( ]7.64d) : 
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Orthogonality of the Lowpass and Highpass Filters 

Matrix View ^ p^ TTTT n 

< > L>2Cr -tlU2 = U 

(h„,g2k-n)n = ^I> H(z)G(z) + H{-z)G{-z) = 

D ^J H(e^)G(e ju ) + H(e j ^ + ^)G(e^ u '+^) = 

(7.74a) 
and similarly for g and h: 

Matrix View ,-, tt/^jt n 

< > L>2tl LrU2 = U 

(g„,h 2 k-n)n = ^> G(z)H(z) + G(-z)H(-z) = 

I 2HJ G{e^)H{e^) + G{e j ^ + ^)H{e3^+^) = 

(7.74b) 

7.4.3 Biorthogonal Two-Channel Filter Bank 

We now pull together what we have developed for biorthogonal filter banks. The 
following result gives one possible example of a biorthogonal filter bank, inspired by 
the orthogonal case. We choose the highpass synthesis filter as a modulated version 
of the lowpass, together with an odd shift. However, because of biorthogonality, it 
is the analysis lowpass that comes into play. 



Theorem 7.7 (Biorthogonal two-channel filter bank) Given are two 
FIR filters g and g of even length L = 2£, £ G Z + , orthogonal to each other 
and their even shifts as in ( 17.66) . Choose 

h n = (-l)"ff„_ 2 , +1 ^ H(z) = - z - L+1 G(-z) (7.75a) 

K = (-l) n g n+2 e-i ^ H{z) = -z L - l G{-z) (7.75b) 

Then, sets {g n -2k-, h n -2k}kez. and {g2k-n, h 2 k- n }kez are a pair of biorthogonal 
bases for £ 2 CZ), implemented by a biorthogonal filter bank specified by analysis 
filters {g,h} and synthesis filters {g,h}. 



Proof. To prove the theorem, we must prove that |(i)| {g«-2fc; h n -2k}kez an d {g2k-n, 
/i2fc-n}feez are biorthogonal sets and |(ii)| they are complete. 

(i) To prove that {g n -2k,h n - 2 k}k£Z and {g2k-n,h 2 k-n}k£2, are biorthogonal sets, 
we must prove ( |7.64| ). The first condition, ( ]7.64a[) , is satisfied by assumption. To 
prove the second, ( ]7.64b|) . that is, h is orthogonal to h and its even shifts, we 
must prove one of the conditions in (17.671) . The definitions of h and h in ( 17.75) 
imply 

H{z)H{z) = G{-z)G(-z) (7.76) 

and thus, 

H{z)H{z) + H{-z)H(-z) = G{-z)G{-z) + G{z)G{z) = 2, 
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where (a) follows from ( ]7.66[) . 



To prove ( j7.64c[ )— ( |7.64d] ), we must prove one of the conditions in ( |7.74aj) — 
(17.74b) , respectively. We prove ( 17.64c) , fl7.64d) follows similarly. 

H(z)G(z) + H{-z)G(-z) = -z L - 1 G{-z)G(z) - {-iy L+1 z L - 1 G{z)G(-z) 

= -z L - 1 G(-z)G(z) + z L - 1 G(z)G{-z) = 0, 



where (a) follows from ([7775a]}; and (b) L — 2£ even, 
(ii) To prove completeness, we prove that perfect reconstruction holds for any x 6 
£ 2 (Z). What we do is find z-domain expressions for Xv(z) and Xw{z) and prove 
they sum up to X(z). We start with the lowpass branch. The proof proceeds as 
in the orthogonal case. 

X v {z) = \G{z) \g(z)X(z) + G(-z)X(-z)] , (7.77a) 

X w {z) = ±H{z) \h(z)X(z) + H(-z)X(-zj\ . (7.77b) 

The output of the filter bank is the sum of xv and xw- 

X v (z) + X w {z) = i \g{z)G{-z) + H{z)H{-z)^ X(-z) 

+ \ [G(z)G(z) + H(z)H(zj\ X{z). (7.78) 

" v ' 

T(z) 

Substituting (7.75) into the above equation, we get: 
S(z) = G{z)G(-z) + H{z)H{-z) 

( = } G(z)G(z)+ ^-z-^Gi-z)^ [-(-^"^(i)] 

= [l + (-l) L+1 ]G(z)G(-z) ^ 0, 

T(z) = G{z)G{z) + H{z)H{z) 

= G{z)G{z) + G{-z)G{-z) = 2, 

where (a) follows from ( j7.75| ; (b) from L — 21 even; (c) from ( 17.75) ; and (d) from 
( 17.66) . Substituting this back into (17.78 L we get 

Iv(z)+I»(z) = X(z), (7.79) 

proving perfect reconstruction, or, in other words, the assertion in the theorem 
statement that the expansion can be implemented by a biorthogonal filter bank. 

Note that we could have also expressed our design problem based on the synthesis 
(analysis) filters only. 

Unlike the orthogonal case, the approximation spaces V and W are not orthogonal 
anymore, and therefore, there exist dual spaces V and W spanned by g- n and h- n 
and their even shifts. However, V is orthogonal to W and W is orthogonal to 
V . This was schematically shown in Figure 17.111 Table 17.10 summarizes various 
properties of biorthogonal, two-channel filter banks we covered until now. 
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7.4.4 Polyphase View of Biorthogonal Filter Banks 

We have already seen how polyphase analysis of orthogonal filter banks adds to the 
analysis toolbox. We now give a brief account of important polyphase notions when 
dealing with biorthogonal filter banks. First, recall from ( ]7.33j ) that the polyphase 
matrix of the synthesis bank is given b y 110 l 



*p(*0 



Go(z) H (z) 
Gx{z) H x {z) 



G{z) = G {z) + z' l G 1 (z), 
H(z) = H (z) + z" 1 H 1 (z). 



By the same token, the polyphase matrix of the analysis bank is given by 



*p(*) 



G (z) H (z) 
G x {z) H x {z) 



G{z) = Gp(z) + zGj(z), 
H(z) = H (z)+zH x (z). 



(7.80a) 



(7.80b) 



Remember that the different polyphase decompositions of the analysis and synthesis 
filters are a matter of a carefully chosen convention. 

For a biorthogonal filter bank to implement a biorthogonal expansion, the 
following must be satisfied: 



%{z) *i (z) 



I. 



From this, 



%(z) = (*J(*))- 



det $ p (z) 



H x {z) -H (z) 
-Gi(z) G {z) 



(7.81) 



(7.82) 



Since all the matrix entries are FIR, for the analysis to be FIR as well, det <& p (z) 
must be a monomial, that is: 



det$ p (z) = G (z)H x (z) - G x (z)H (z) = z~ 



(7.83) 



In the above, we have implicitly assumed that & p (z) was invertible, that is, its 
columns are linearly independent. This can be rephrased in filter bank terms by 
stating when, given G(z), it is possible to find H(z) such that it leads to a per- 
fect reconstruction biorthogonal filter bank. Such a filter H(z) will be called a 
complementary filter. 

Proposition 7.8 (Complementary filters) Given a causal FIR filter G(z), 
there exists a complementary FIR filter H(z), if and only if the polyphase com- 
ponents of G(z) are coprime (except for possible zeros at z = oo). 



Proof. We just saw that a necessary and sufficient condition for perfect FIR recon- 
struction is that det($ p (z)) be a monomial. Thus, coprimeness is obviously necessary, 
since if there were a common factor between Go(z) and Gi(z), it would show up in the 
determinant. 



11Q When we say polyphase matrix, we will mean the polyphase matrix of the synthesis bank; for 
the analysis bank, we will explicitly state analysis polyphase matrix. 
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Sufficiency foliows from the Bezout's identity ( 12.282] ) that says that given two 
coprime poiynomiais a(z) and b(z), the equation a(z)p(z) + b(z)q(z) — c(z) has a 
soiution p(z), q(z). Fixing a(z) = Go(z), b(z) — Gi(z) and c(z) — z~ k , we see that 
Bezout's identity is equal to (17.831) . and thus guarantees a solution p(z) — Hq(z) and 
q(z) — Hi(z), that is, a complementary filter H(z). 

Note that the coprimeness of Gq(z) and G\{z) is equivalent to G(z) not having any 
zero pairs {zq,—zq}. This can be used to prove that the binomial filter G(z) = 
(1 + z~ l ) always has a complementary filter (see Exercise 17.12]) . 

The counterpart to Theorem 7.31 and Corollary 17.41 for orthogonal filter banks 
are the following theorem and corollary for the biorthogonal ones (we state these 
without proof): 



Theorem 7.9 (Positive definite matrix and biorthogonal basis) Given 
a filter bank implementing a biorthogonal basis for 1 2 {X) and its associated 
polyphase matrix $p(e J ' u ') ) then $ p (e- ,u ')$J (e -J ' w ) is positive definite. 



Corollary 7.10 (Filtered deterministic autocorrelation matrix is positive semidefinite) 
Given is a 2 x 2 polyphase matrix $ p (e juJ ) such that ^> p (e juJ )^>p(e~ ju ). Then the 
filtered deterministic autocorrelation matrix, A p>a (e JLU ), is positive semidefinite. 



7.4.5 Linear-Phase Two-Channel Filter Banks 

We started this section by saying that one of the reasons we go through the trouble 
of analyzing and constructing two-channel biorthogonal filter banks is because they 
allow us to obtain real-coefficient FIR filters with linear phase ] 111 ! Thus, we now do 
just that: we build perfect reconstruction filter banks where all the filters involved 
are linear phase. Linear-phase filters were defined in ( ]2.106| ). 

As was true for orthogonal filters, not all lengths of filters are possible if 
we want to have a linear-phase filter bank. This is summarized in the following 
proposition, the proof of which is left as Exercise I7.14J 



Proposition 7.11 In a two-channel, perfect reconstruction filter bank where all 
filters are linear phase, the synthesis filters have one of the following forms: 

(i) Both filters are odd-length symmetric, the lengths differing by an odd mul- 
tiple of 2. 

(ii) One filter is symmetric and the other is antisymmetric; both lengths are 
even, and are equal or differ by an even multiple of 2. 



111 If we allow filters to have complex- valued coefficients or if we lift the restriction of two channels, 
linear phase and orthogonality can be satisfied simultaneously. 
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(iii) One filter is of odd length, the other one of even length; both have all zeros 
on the unit circle. Either both filters are symmetric, or one is symmetric 
and the other one is antisymmetric. 



Our next task is to show that indeed, it is not possible to have an orthogonal 
filter bank with linear-phase filters if we restrict ourselves to the two-channel, FIR, 
real-coefficient case: 



Proposition 7.12 The only two-channel perfect reconstruction orthogonal filter 
bank with real-coefficient FIR linear-phase filters is the Haar filter bank. 



Proof. In orthogonal filter banks, ( |7.40D -( [7.41[) hold, and the niters are of even length. 
Therefore, following Proposition 7.111 one filter is symmetric and the other antisym- 
metric. Take the symmetric one, G(z) for example, 

G(z) ( =' Goiz^ + z-'diz 2 ) 

= z G(z ) — z (Go(z ) + zGi(z )) 

= 2 



-L + 2^ 1 -2\ . -1 / -L + 2,-, , -2\\ 
Gl{Z ) + 2 (2 Go(Z )), 



where (a) and (c) follow from ( 17.321) , and (b) from ( ]2.152[) . This further means that for 
the polyphase components, the following hold: 

G (z) = 2" L/2+1 G 1 (2- 1 ), Gi(z) = 2- L/2+1 G (2- 1 ). (7.84) 



Substituting ( 17.841) into ( 17.401) we obtain 

G (z) G (2" 1 ) = |. 
The only FIR, real-coefficient polynomial satisfying the above is 

Go (z) = ^2-. 

Performing a similar analysis for Gi(z), we get that Gi(z) — (1/V2)z~ k , and 

G(z) = .*.(,-" + z -»-i), H(z) = G(-2), 

yielding Haar filters (m = k — 0) or trivial variations thereof. 

While the outstanding features of the Haar filters make it a very special solution, 
Proposition 17.121 is a fundamentally negative result as the Haar filters have poor 
frequency localization and no polynomial reproduction capability. 

7.5 Design of Biorthogonal Two-Channel Filter 
Banks 

Given that biorthogonal filters are less constrained than their orthogonal counter- 
parts, the design space is much more open. In both cases, one factors a Laurent 
polynomia l 112 ! C{z) satisfying C{z) + C{—z) = 2 as in ( 17.681 ). In the orthogonal 

112 A Laurent polynomial is a polynomial with both positive and negative powers, see Ap- 
pendix I2.B.11 
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case, C(z) was a deterministic autocorrelation, while in the biorthogonal case, it 
is a deterministic crosscorrelation and thus more general. In addition, the orthog- 
onal case requires spectral factorization (square root), while in the biorthogonal 
case, any factorization will do. While the factorization method is not the only 
approach, it is the most common. Other approaches include the complementary 
filter design method and the lifting design method. In the former, a desired filter 
is complemented so as to obtain a perfect reconstruction filter bank. In the latter, 
a structure akin to a lattice is used to guarantee perfect reconstruction as well as 
other desirable properties. 

7.5.1 Factorization Design 

From ( |7.66) -( ]7.68) , C(z) satisfying C(z) + C(—z) = 2 can be factored into 

C(z) = G{z)G(z), 

where G(z) is the synthesis and G(z) the analysis lowpass filter (or vice- versa, since 
the roles are dual). The most common designs use the same C(z) as those used 
in orthogonal filter banks, for example, those with a maximum number of zeros at 
z = — 1, performing the factorization so that the resulting filters have linear phase. 

Example 7.4 (Biorthogonal filter bank with linear-phase filters) We 
reconsider Example 17.31 in particular C(z) given by 



C(z) 



(i+o a d+*)i(-r- i+i -i*)' 



which satisfies C(z) + C(—z) = 2 by construction. This also means it satisfies 
( 17.66) for any factorization of C(z) into G(z)G(z). Note that we can add factors 
z or z~ l in one filter, as long as we cancel it in the other; this is useful for 
obtaining purely causal/anticausal solutions. 
One possible factorization is 

G(z) = z- 1 (1 + z- 1 ) 2 (1 + z) = (1 + z- 1 ) 3 = l + Sz^+Sz^ + z- 3 , 

G{z) = z{l + z)\ ( -\ z - 1 + l-\z J = -}- (-1 + 3z + 3z 2 - z 3 ) . 
41 4 4 / 16 v ' 



The other filters follow from (7775), with L = 21 = 4: 

H(z) = -z- 3 — (-l-3z + 3z 2 + z 3 ) = — (-1 - Zz' 1 + 3z~ 2 + z~ 3 ) 
v ) 16 v ) 16 V ) 

— 3\ — 1 Q_ 1 q„2 ,3 



H(z) = -z 3 (l-3z~ 1 + 3z- 2 - z^ 3 ) = l-3z + 3z 



z 



The lowpass filters are both symmetric, while the highpass ones are antisymmet- 
ric. As H(z) has three zero moments, G(z) can reproduce polynomials up to 
degree 2, since such sequences go through the lowpass channel only. 
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Another possible factorization is 

G(z) = (1 + z' 1 ) 2 (1 + zf = z-z + iz-i + G + iz + z 2 , 

^) = i(-r" 1+1 -r) = t^h- 1 +*-*), 

where both lowpass filters are symmetric and zero phase. The highpass filters 
are (with L = 0): 

H(z) = —^z(z- 1 +4 + z) 1 
H(z) = -z" 1 {z~ 2 - Az" 1 +6-4z + z 2 ) , 
which are also symmetric, but with a phase delay of ±1 sample. 



The zeros at z = — 1 in the synthesis lowpass filter become, following ( ]7.75b[ ). 
zeros at z = 1 in the analysis highpass filter. Therefore, many popular biorthogonal 
filters come from symmetric factorizations of C(z) with a maximum number of zeros 
at z = — 1. 

Example 7.5 (Design of the 9/7 filter pair) The next higher-order C(z) 
with a maximum number of zeros at z = — 1 is of the form 

C(z) = 2~ 8 {1 + zf (1 + z" 1 ) 3 (3z 2 - 18z + 38 - 18Z" 1 + 3z- 2 ) . 

One possible factorization yields the so-called Daubechies 9/7 filter pair (see 
Table 17.50 . These filters have odd length and even symmetry, and are part of the 
JPEG 2000 image compression standard. 





Daubechies 9/7 




LeGall 5/3 


11 


<?n 


9n 


9n 


9n 





0.60294901823635790 


1.11508705245699400 


3/4 


1 


±1 


0.26686411844287230 


0.59127176311424700 


1/4 


1/2 


±2 


-0.07822326652898785 


-0.05754352622849957 


-1/8 




±3 


-0.01686411844287495 


-0.09127176311424948 






±.-1 


0.02674875741080976 









Table 7.5: Biorthogonal filters used in the still-image compression standard JPEG 2000. 
The lowpass filters are given; the highpass filters can be derived using fl7.75ap -( j7.75b[) . 
The first pair is from [5] and the second from [95] . 



7.5.2 Complementary Filter Design 

Assume we have a desired synthesis lowpass filter G(z). How can we find G{z) 
such that we obtain a perfect reconstruction biorthogonal filter bank? It suffices to 
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find G(z) so that ( 17.66) is satisfied, which, according to Proposition 17.81 can always 
be done if G(z) has coprime polyphase components. Then G(z) can be found by 
solving a linear system of equations. 

Example 7.6 (Complementary filter design) Suppose 

G(z) = \z + l + \z- 1 = \{l + z){l + z- 1 ). 

We would like to find G(z) such that C(z) = G(z)G(z) satisfies C{z)+C{-z) = 2. 
(It is easy to verify that the polyphase components of G(z) are coprime, so 
such a G(z) should exist. We exclude the trivial solution G(z) = 1; it is of 
no interest as it has no frequency selectivity.) For a length-5 symmetric filter 
G(z) = cz 2 + bz + a + bz^ 1 + cz~ 2 , we get the following system of equations: 

a + b = 1 and ^b + c = 0. 

To get a unique solution, we could, for example, impose that the filter have a 
zero at z = — 1, 

a- 26 + 2c = 0, 

leading to a = 6/8, b = 2/8, and c = —1/8: 

G(z) = - (-z 2 + 2Z + Q + 2Z' 1 - z- 2 ) . 
8 

All coefficients of (g, g) are integer multiples of 1/8, making the analysis and 
synthesis exactly invertible even with finite-precision (binary) arithmetic. These 
filters are used in the JPEG 2000 image compression standard; see Table 17.51 

As can be seen from this example, the solution for the complementary filter is highly 
nonunique. Not only are there solutions of different lengths (in the case above, any 
length 3 + 4m, in € N, is possible), but even a given length has multiple solutions. 
It can be shown that this variety is given by the solutions of a Diophantine equation 
related to the polyphase components of the filter G(z). 

7.5.3 Lifting Design 

We conclude this section with the design procedure based on lifting. While the orig- 
inal idea behind lifting was to build shift- varying perfect reconstruction filter banks, 
it has also become popular as it allows for building discrete-time bases with non- 
linear operations. The trivial filter bank to start lifting is the polyphase transform 
which splits the sequence into even- and odd- indexed components as in Figure 7.131 
In the first lifting step, we use a prediction filter P to predict the odd coefficients 
from the even ones. The even coefficients remain unchanged, while the result of 
the prediction filter applied to the even coefficients is subtracted from the odd co- 
efficients yielding the highpass coefficients. In the second step, we use an update 
filter U to update the even coefficients based on the previously computed highpass 
coefficients. We start with a simple example. 
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© — h9~^-9 




Figure 7.13: The lifting filter bank, with P and U predict and update operators, respec- 
tively. 



Example 7.7 (Haar filter bank obtained by lifting) The two polyphase 
components of x are xq (even subsequence) and xi (odd subsequence) as in 
( 12.210) . The purpose of the prediction operator P is to predict odd coefficients 
based on the even ones. The simplest prediction says that the odd coefficients are 
exactly the same as the even ones, that is p n = S n . The output of the highpass 
branch is thus the difference (5 n — <5„_i), a reasonable outcome. The purpose of 
the update operator U is to then update the even coefficients based on the newly 
computed odd ones. As we are looking for a lowpass-like version in the other 
branch, the easiest is to subtract half of this difference from the even sequence, 
leading to xo tTl — (xi tTl — XQ tn )/2, that is, the average {x , n + ^l,n)/2, again a 
reasonable output, but this time lowpass in nature. Within scaling, it is thus 



clear that the choice p„ 



(l/2)5„ leads to the Haar filter bank. 



Let us now identify the polyphase matrix $ p (z): 

9 g (z) = a(z)-U(z)/3(z), 

* h (z) = 0(z) + P(z)9 g (z) 

= f3(z) + P(z)(a(z)-U(z)f3(z)) 

= P{z)a{z) + {l-P{z)U{z))l3(z) 



which we can write as 

9 g (z)] [ 1 -U(z) 

_$ h (z)\ ~ [P(z) l-P(z)U(z) 

On the analysis side, & p (z) is: 

%{z) = (^(z))- 1 = 



a(z) 

f3(z) 



*p(*) 



a(z) 
3{z) 



P(z)U(z) 
U{z) 



-P{z) 
1 



(7.85) 



(7.86) 



As the det($ p (z)) = 1, the inverse of <& p (z) does not involve actual inversion, one 
of the reasons why this technique is popular. Moreover, we can write $ p as 



*p(*) 



1 


-U(z) 




1 




1 


-U(z 


P(z) 1 


-P(z)U(z) 




P(z) 1 







1 



(7.87) 
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decomposing & p (z) into a sequence of lower/upper triangular matrices — lifting steps. 
What we also see is that the inverse of each matrix of the form: 



" 1 


0" 


-i 


1 


0" 


M 


1 




-M 


1 



and 



1 


M 


-i 


1 


-M 





1 







1 



meaning to invert these one needs only reverse the sequence of operations as shown 
in Figure 17.131 This is why this scheme allows for nonlinear operations; if M is 
nonlinear, its inversion amounts to simply reversing the sign in the matrix. 

7.6 Two-Channel Filter Banks with Stochastic Inputs 

Our discussion so far assumed we are dealing with deterministic sequences as inputs 
into our filter bank, most often those with finite energy. If the input into our filter 
bank is stochastic, then we must use the tools developed in Chapter \2\ Section 2.81 
The periodic shift variance for deterministic systems has its counterpart in wide- 
sense cyclostationarity. The notions of energy spectral density ( 12.961 ) (DTFT of 
the deterministic autocorrelation) and energy ( 12.981 ) have their counterparts in the 
notions of power spectral density ( 12.232) (DTFT of the stochastic autocorrelation) 
and power ( 12.233) . We now briefly discuss the effects of a filter bank on an input 
WSS sequence. 

Until now, we have seen various ways of characterizing systems with deter- 
ministic and stochastic inputs, among others via the deterministic and stochastic 
autocorrelations. 

In a single-input single-output system: 

(i) For a deterministic sequence, its autocorrelation is Hermitian symmetric (see 
( 12.16) ) and can be factored as in ( 12.96ft , ( J2.142) , that is, it is nonnegative on 
the unit circle. It is sometimes called energy spectral density, 
(ii) For a WSS sequence, the counterpart to the deterministic autocorrelation is 
the power spectral density given in (2.232) . 

In a multiple-input multiple-output system, such as a filter banks, where the 
multiple inputs are naturally polyphase components of the input sequence: 

(i) For a deterministic sequence, we have a matrix autocorrelation of the vector 
of polyphase components [ceo ^l] , given by ( 12.214) . In particular, we have 

seen it for a the vector of expansion coefficient sequences [a p\ in two- 
channel filter bank earlier in this chapter, in ( J7.43) . 
(ii) For a WSS sequence x, we can also look at the matrix of power spectral densi- 
ties of the polyphase components \xq x\\ as in ( 12.245) . In what follows, we 
analyze that matrix for the vector of expansion coefficient sequences \a 0] . 

Filter-Bank Optimization Based on Input Statistics The area of optimizing filter 
banks based on input statistics is an active one. In particular, principal-component 
filter banks have been shown to be optimal for a wide variety of problems (we 
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give pointers to the literature on the subject in Further Reading). For example, 
in parallel to our discussion in Chapter \5\ on the use of KLT, it is known that the 
coding gain is maximized if the channel sequences are decorrelated, ($ a is diagonal), 
and A a (e JU> ) > Ap(e JLU ) if var(a„) > var(/3 n ). We can diagonalize $ a by factoring 
it as $ a = QAQ* , where Q is the matrix of eigenvectors of $ . 

7.7 Computational Aspects 

The power of filter banks is that they are a computational tool; they implement a 
wide variety of bases (and frames, see Chapter [TO]). As the two-channel filter bank 
is the basic building block for many of these, we now spend some time discussing 
various computational concerns that arise in applications. 

7.7.1 Two-Channel Filter Banks 

We start with a two-channel filter bank with synthesis filters {g, h} and analysis 
filters {<?, h}. For simplicity and comparison purposes, we assume that the input is 
of even length M, filters are of even length L, and all costs are computed per input 
sample. From ( l7.80b[ ), the channel signals a and (3 are 

a = go*x +'gi *zi, (7.88a) 

/3 = h *x + hi*xi, (7.88b) 

where go.ij 7*0,1 ar e the polyphase components of the analysis filters g and h. We 
have immediately written the expression in polyphase domain, as it is implicitly 
clear that it does not make sense to do the filtering first and then discard every 
other product (see Section [2.9.3) . 

In general, ( 17.88) amounts to four convolutions with polyphase components 
xq and xi, each of half the original length, plus the necessary additions. Instead of 
using ( 12.266) , we compute directly the cost per input sample. The four convolutions 
operate at half the input rate and thus, for every two input samples, we compute 
4L/2 multiplications and 4((L/2) — 1) + 2 additions. This leads to L multiplica- 
tions and L — 1 additions/input sample, that is, exactly the same complexity as a 
convolution by a single filter of size L. The cost is thus 

Cbiorth,timo = 2L - 1 ~ 0{L), (7.89) 

per input sample. 

If an FFT-based convolution algorithm is used, for example, overlap-add, we 
need four convolutions using DFTs of length N as in (2.265) , plus 2 AT additions. 
Assume for simplicity and comparison purposes that M = L = N2. 

Cbiorth.ftcq = 16alog 2 L + 14 ~ 0(log 2 £), (7.90) 

per input sample. 
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In [121], a precise analysis is made involving FFTs with optimized lengths 
so as to minimize the operation count. Using the split-radix FFT algorithm, the 
number of operations becomes (for large L) 



C, 



biorth,freq,optim 



41og 2 L 



C(log 2 log 2 L) 



again per input sample. Comparing this to Cbiorth.frcq (and disregarding the con- 
stant a), the algorithm starts to be effective for L = 8 and a length- 16 FFT, where 
it achieves around 5 multiplications per sample rather than 8, and leads to im- 
provements of an order of magnitude for large filters (such as L = 64 or 128). For 
medium-size filters [L = 6, . . . , 12), a method based on fast running convolution is 
best (see [121]). 

Let us now consider some special cases where additional savings are possible. 

Linear- Phase Filter Banks It is well-known that if a filter is symmetric or anti- 
symmetric, the number of operations can be halved in (7.891 ) by simply adding (or 
subtracting) the two input samples that are multiplied by the same coefficient. This 
trick can be used in the downsampled case as well, that is, filter banks with linear- 
phase filters require half the number of multiplications, or L/2 multiplications per 
input sample (the number of additions remains unchanged) , for a total cost of 



C, 



lp. direct 



0(L), 



(7.91) 



still O(L) but with a savings of roughly 25% over ( 17.89) . If the filter length is 
odd, the polyphase components are themselves symmetric or antisymmetric, and 
the saving is obvious in ( 17.88) . 

Another option is to use a linear-phase lattice factorization: 



**(*) 



o 



N/2-1 

n 

;=i 



1 
z' 1 



1 Ui 
Ui 1 



The individual 2x2 symmetric matrices can be written as (we assume a, ^ 1) 



1 on 

Ui 1 



1 - a. 



1 


1 




fl+tti 

1-Qi 


0" 




1 


1 


1 


-1 







1 




1 


-1 



By gathering the scale factors together, we see that each new block in the cascade 
structure (which increases the length of the filters by two) adds only one multipli- 
cation. Thus, we need L/4 multiplications, and (L— 1) additions per input sample, 
for a total cost of 

CViatticc = \l-1 ~ 0(L), (7.92) 

per input sample. The savings is roughly 16% over (7.91) , and 37.5% over ( 17.89) . 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



610 



Chapter 7. Filter Banks: Building Blocks of Time-Frequency Expansions 



Two-Channel Filter Bank /i 



Cost 



Order 



Biorthogonal 
Frequency 
Time 

Linear phase 
Direct form 
Lattice form 

Orthogonal 

Lattice form 
Denormalized lattice 

QMF 



16a log 2 £ + 14 0(log 2 L) 

£ £-1 2£-l 0(£) 

(1/2) L £-1 (3/2)L-l 0(£) 

(1/4) £ £-1 (5/4)£-l 0(£) 

(3/4) £ (3/4) £ (3/2) £ 0(£) 

(l/2)£ + l (3/4) £ (5/4) £ + 1 0(£) 

(1/2) £ (1/2) £ £ 0(£) 



Table 7.6: Cost per input sample of computing various two-channel filter banks with 
length-£ filters. 



Orthogonal Filter Banks As we have seen, there exists a general form for a two- 
channel paraunitary matrix, given in ( 17.39J ). If G${z) and G\{z) were of degree 
zero, it is clear that the matrix in ( 17.39) would be a rotation matrix, which can 
be implemented with three multiplications, as we will show shortly. It turns out 
that for arbitrary-degree polyphase components, terms can still be gathered into 
rotations, saving 25% of multiplications (at the cost of 25% more additions). This 
rotation property is more obvious in the lattice structure form of orthogonal filter 
banks ( 17.54) , where matrices Rk can be written as: 



cos( 
sin^ 



- sint 
cos( 



1 1 
1 1 



cos 9i — sin i 








cos 0i + sin 0i 








sin£ 





1 


0" 







1 




1 


-1 



Thus, only three multiplications are needed, or 3L/2 for the whole lattice. Since 
the lattice works in the downsampled domain, the cost is 3L/4 multiplications per 
input sample and a similar number of additions, for a total cost of 



C 



■th.lattit 



h 

2 



Q(L), 



(7.93) 



per input sample. We could also denormalize the diagonal matrix in the above 
equation (taking out sin#i for example) and gather all scale factors at the end of 
the lattice, leading to (L/2 + 1) multiplications per input sample, and the same 
number of additions as before, for a total cost of 



a 



orth, lattice, dcnorm 



4 



O(L), 



(7.94) 



per input sample. 
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G (z) 


Gi{z) 




1 


1 




G (z) 


G Q (z) 


-Gi(z) 




1 


-1 




Gi{z 
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QMF Filter Banks The classic QMF solution discussed in Exercise 17.191 besides 
using even-length linear phase filters, forces the highpass filter to be equal to the 
lowpass, modulated by (—1)™. The polyphase matrix is therefore: 

where Go and G\ are the polyphase components of G(z). The factorized form on 
the right indicates that the cost is halved. However, this scheme only approxi- 
mates a basis expansion (perfect reconstruction) when using FIR filters. Table 7.6 
summarizes the costs of various filter banks we have seen so far. 

Multidimensional Filter Banks While we have not discussed multidimensional 
filter banks so far (some pointers are given in Further Reading), we do touch upon 
the cost of computing them. For example, filtering an M x M image with a filter of 
length L x L requires of the order of 0(M 2 L 2 ) operations. If the filter is separable, 
that is, G(z\,Z2) = G\{zi)G2{z2) 1 then filtering on rows and columns can be done 
separately and the cost is reduced to an order 0(2M 2 L) operations (M row filterings 
and M column filterings, each using ML operations). 

A multidimensional filter bank can be implemented in its polyphase form, 
bringing the cost down to the order of a single nondownsampled convolution, just 
as in the one-dimensional case. A few cases of particular interest allow further 
reductions in cost. For example, when both filters and downsampling are sepa- 
rable, the system is the direct product of one-dimensional systems, and the im- 
plementation is done separately over each dimension. Consider a two-dimensional 
system filtering an M x M image into four subbands using the filters {G(zi)G(z2), 
G(zi)H(z%), H(zi)G(z 2 ), H(zi)H(z 2 )} each of length M x M followed by separable 
downsampling by two in each dimension. This requires M decompositions in one 
dimension (one for each row), followed by M decompositions in the other, for a total 
of 0(2M 2 L) multiplications and a similar number of additions. This is a saving of 
the order of L/2 with respect to the nonseparable case. 

7.7.2 Boundary Extensions 

While most of the literature as well as our exposition implicitly assume infinite- 
length sequences, in practice this is not the case. Given an N x N image, for 
example, the result of processing it should be another image of the same size. In 
Chapter \2\ we discussed the finite-length case by introducing periodic (circular) 
extension, when the appropriate convolution is the circular convolution and the 
appropriate Fourier transform is the DFT. In practice, however, periodic extension 
is rather artificial as it wraps the sequence around (for example, what is on the 
left boundary of the image would appear on the right boundary) . Other extension 
are possible, and while for some of them (for example, symmetric), appropriate 
notions of convolution and Fourier transform are available, in practice this is not 
done. Instead, different types of extensions are applied (zero-padding, symmetric, 
continuous, smooth) while still using the tools developed for the periodic extension. 
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Figure 7.14: Boundary extensions, (a) Original sequence a; of length JV. (b) Periodic 
extension: x is repeated with a period N. (c) Zero-padding extension: Beyond the support, 
y is set to zero, (d) Symmetric extension: The sequence is flipped at the boundaries to 
preserve continuity. (Half-point symmetry is shown is shown.) (e) Continuous extension: 
The boundary value is replicated, (e) Smooth extension: At the boundary, a polynomial 
extension is applied to preserve higher-order continuity. 

Throughout this subsection, we assume a sequence of length 7V; also, we will be 
using the extension nomenclature adopted in Matlab, and will point out other names 
under which these extensions are known. 

Periodic Extension From x, create a periodic y as 



Vr, 



%n mod TV- 



Of those we consider here, this is the only mathematically correct extension in 
conjunction with the DFT. Moreover, it is simple and works for any sequence 
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length. The drawback is that the underlying sequence is most likely not peri- 
odic, and thus, pe riodization creates artificial discontinuities at multiples of TV; see 

Figure EHl(b)@ 

Zero-Padding Extension From x, create y as 

x n , n = 0, 1, ..., iV- 1; 

0, otherwise. 

Again, this extension is simple and works for any sequence length. However, it 
too creates artificial discontinuities as in Figure [7.14( c). Also, during the filtering 
process, the sequence is extended by the length of the filter (minus 1), which is 
often undesirable. 

Symmetric Extension From x, create a double-length y as 

c„, n = 0, 1, ..., N- 1; 



1 X2N-n-i, n = N, N + l, ...,2/V -1, 

and then periodize it. As shown in Figure P7. 14( d), this periodic sequence of period 
27V does not show the artificial discontinuities of the previous two cases ! 114 ! However, 
the sequence is now twice as long, and unless carefully treated, this redundancy 
is hard to undo. Cases where it can be handled easily are when the filters are 
symmetric or antisymmetric, because the output of the filtering will be symmetric 
or antisymmetric as well. 

There exist two versions of the symmetric extension, depending on whether 
whole- or half-point symmetry is used. The formulation above is called half-point 
symmetric because y is symmetric about the half-integer index value N — ,• An 
alternative is whole-point symmetric y 

, c„, n = 0, 1, ..., N- 1; 

1 X2N-n-i, n = N, N+l, ...,2/V -2, 

with even symmetry around N. 

Continuous Extension From x, create a double-length y as 

x n , n = 0, 1, . . . , N - 1; 
xjv-i, n = N, N + l, ..., 2N- 1; 
xo, n = 0,— 1, . . .,— N + 1, 

shown in Figure !?. 14( e). This extension is also called boundary replication extension. 
It is a relatively smooth extension and is often used in practice. 



113 Technically speaking, a discrete sequence cannot be continuous or discontinuous. However, if 
the sequence is a densely sampled version of a smooth sequence, periodization will destroy this 
smoothness. 

114 It does remain discontinuous in its derivatives however; for example, if it is linear, it will be 
smooth but not differentiable at and N. 
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Smooth Extension Another idea is to extend the sequence by polynomial extrap- 
olation, as in Figure [7.14( f). This is only lightly motivated at this point, but after 
we establish polynomial approximation properties of the discrete wavelet transforms 
in Chapter \9\ it will be clear that a sequence extension by polynomial extrapolation 
will be a way to get zeros as detail coefficients. The order of the polynomial is such 
that on the one hand, it gets annihilated by the zero moments of the wavelet, and 
on the other hand, it can be extrapolated by the lowpass filter. 
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Chapter at a Glance 

Our goal in this chapter was to use signal processing machinery to build discrete-time bases 
with structure in terms of time-frequency localization properties. Moreover, we restricted 
ourselves to those bases generated by two prototype sequences, one that together with its 
shifts covers the space of lowpass sequences, and the other that together with its shifts 
covers the space of highpass sequences. The signal processing tool implementing such 
bases is a two-channel filter bank. 



Two-Channel Filter Bank 



Block diagram 




Basic characteristics 

number of channels 
sampling factor 
channel sequences 



M 

N -- 



Pn 



Filters 



Synthesis 

lowpass 

9n 

9n 



Analysis 

highpass lowpass 



orthogonal g n h n #_„ 

biorthogonal g n h n g n 

polyphase components go,n,gi,n ho, n ,hi in go,n,gi,n ho, n ,hi, n 



highpass 

h- n 

hn 



Table 7.7: Two-channel filter bank. 



Haar Filter Bank Synthesis 

lowpass 



highpass 



Analysis 

lowpass 



highpass 



Time domain 



z-domain 



9n h n g- n h- n 

(5 n +5„_i)/\/2 (5 n -<5„-i)/V2 (& n +S n+1 )/V2 (S n -S n+1 )/V2 

G(z) H{z) G(z _1 ) Hiz' 1 ) 

{l + z- v )/V2 (l-z- 1 )/^ (l + z)/V2 (l-z)/V2 



DTFT domain G(e^) H(e^) G(e"^) H(e~ ju ') 

(l + e- j ")/V2 (l-e- j ")/V2 (l + e j ")/V2 (1 - e J ")/V2 



Table 7.8: Haar hlter bank in various domains. 
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Two-Channel Orthogonal Filter Bank 



Relationship between lowpass and highpass filters 



Time domain 
Matrix domain 
z domain 
DTFT domain 
Polyphase domain 



{hn, 9n-2k)n — 

D 2 H T GU 2 = 

G(z)H{z- v ) + G(-z)H(-z- v ) = 
G(e jw )H(e j ") + G(e^ UJ +^)H(e^+^) 
Go(z)Gi(z- 1 ) + H (z)H 1 (z- 1 ) = 



Frequency domain 


lowpass highpass 












Time domain 


{dn-2k}k€l {An-2k}fceZ 








Filters 


Synthesis 




Analysis 








lowpass highpass 


lowpass 


highpass 




Time domain 


9n ±(-l) n 9_ n+2 f_i 


9-n 


±(-l) n g„+2i-i 




z domain 


G(z) Tz- 2e+l G(-z~ l ) 


G(z- V ) 


T^-'Gi-z) 




DTFT domain 


G{e>") ¥e J(-2«+i)u. G ( e -j(-+*)) 


Gie'^) 


¥e i(M-l)u. G(e J( 


•>+»)) 


Matrix view 


Basis 










Time domain 


$ 


9n-2k h n -2k 








z domain 


*(*) 


'G(z) H(z) 
G{-z) H(-z) 










DTFT domain 


$(e JtJ ) 










Polyphase domain 


*pW 


G (z) H (z) 
Gy(z) H 1 (z)_ 













Constraints 

Time domain 

z domain 
DTFT domain 
Polyphase domain 



Orthogonality relations 

<S> T <S> = I 

*(2" 1 ) T *(^) = I 



Perfect reconstruction 

<1><S> T = I 

<i>(z)<S> T (z- 1 ) = I 
$(e j ")$ T (e- j ") = 1 
$ p (z)<pT(z- 1 ) = I 



Table 7.9: Properties of an orthogonal two-channel filter bank. 
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Two-Channel Biorthogonal Filter Bank 


Relationship between lowpass and highpass filters 






Time domain 


(h„, g n -2k)n — 









Matrix domain 


D 2 H T GU 2 = 








2 domain 


G(z)H(z~ 1 ) + G(-z)H(-z~ 1 ) = 






DTFT domain 
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Polyphase domain 
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Sequences 
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Dual basis 






lowpass highpass 


lowpass highpass 
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Filters 


Synthesis 
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lowpass highpass 
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Time domain 
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z domain 
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G(z) T2 2£_1 G(-2) 


DTFT domain 
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Time domain 
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$ 
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_G(-*) H(-z) 
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*pW 


'Go(z) H (z) 
GAz) H 1 (z)_ 




*pW 


Go (2) H (z) 
d(z) H 1 (z)_ 




Constraints 


Biorthogonali 


ty relations 


Perfect reconstruction 


Time domain 


* T $ = I 




$$ T =7 




z domain 


*(^) T $(2) = I 




*(2)5 T (2) =/ 




DTFT domain 


* T (e^)5(e J ^) 
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= I 


Polyphase domain 
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Table 7.10: Properties of a biorthogonal two-channel filter bank. 
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Historical Remarks 

Filter banks have been popular in signal processing since the 
1970s when the question of critically-sampled filter banks, those 
with the number of channel samples per unit of time conserved, 
arose in the context of subband coding of speech. In that method, 
a speech sequence is split into downsampled frequency bands, al- 
lowing for more powerful compression. However, downsampling 
can create a perceptually disturbing effect known as aliasing, prompting Esteban and Ga- 
land [52] in 1977 to propose a .simple and elegant quadrature mirror filters (QMF) aliasing- 
removal technique. As QMF solution does not allow for perfect reconstruction, a flurry 
of work followed to solve the problem. Mintzer [105] as well Smith and Barnwell [135] 
proposed an orthogonal solution independently in the mid 1980s. Vaidyanathan [156] 
established a connection to lossless systems, unveiling the factorization and design of pa- 
raunitary matrices [160] ■ For wavelet purposes, Daubechies then designed filters with a 
maximum number of zeros at z — — 1 [39], a solution that goes back to Herrmann's design 
of maximally flat FIR filters [74] . The equivalent IIR filter design problem leads to Butter- 
worth filters, as derived by Herley and Vetterii [72]. Vetterii solved the biorthogonal filter 
bank problem [164, 165], while Cohen, Daubechies and Feauveau [30] as well as Vetterii 
and Herley [166] tackled those with maximum number of zeros at z — —1. The poly- 
phase framework was used by many authors working on filter banks, but really goes back 
to earlier work on transmultiplexers by Bellanger and Daguet [8]. The realization that 
perfect reconstruction subband coding can be used for perfect transmultiplexing appears 
in [165] . The idea of multichannel structures that can be inverted perfectly, including with 
quantization, goes back to ladder structures in filter design and implementation, in the 
works of Bruckens and van den Enden, Marshall, Shah and Kalker [20, 103, 129]. Sweldens 
generalized this idea under the name of lifting [147] , deriving a number of new schemes 
based on this concept, including filter banks with nonlinear operators and nonuniform 
sampling. 

Further Reading 

Books and Textbooks A few standard textbooks on filter banks exist, written by 
Vaidyanathan [158] , Vetterii and Kovacevic [167] , Strang and Nguyen [143] , among others. 

TV-Channel Filter Banks One of the important and immediate generalizations of two- 
channel filter banks is when we allow the number of channels to be N. Numerous op- 
tions are available, from directly designing iV-channel filter banks, studied in detail by 
Vaidyanathan |156||157] , through those built by cascading filter banks with different num- 
ber of branches, leading to almost arbitrary frequency divisions. The analysis methods 
follow closely those of the two-channel filter banks, albeit with more freedom; for example, 
orthogonality and linear phase are much easier to achieve at the same time. We discuss 
A-channel filter banks in detail in Chapter [SJ with special emphasis on local Fourier bases. 

Multidimensional Filter Banks The first difference we encounter when dealing with 
multidimensional filter banks is that of sampling. Regular sampling with a given density 
can be accomplished using any number of sampling lattices, each having any number of 
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associated sampling matrices. These have been described in detail by Dubois in [47], and 
have been used by Viscito and Allebach [169], Karlsson and Vetterii [84], Kovacevic and 
Vetterii [91] , Do and Vetterii [43] , among others, to design multidimensional filter banks. 
Apart from the freedom coming with different sampling schemes, the associated filters can 
now be truly multidimensional, allowing for a much larger space of solutions. 

MR Filter Banks While IIR filters should be of importance because of their good fre- 
quency selectivity and computational efficiency, they have not been used extensively as 
their implementation in a filter-bank framework comes at a cost: one side of the filter bank 
is necessarily anticausal. They have found some use in image processing as the finite length 
of the input allows for storing the state in the middle of the filter bank and synthesizing 
from that stored state. Coverage of IIR filter banks can be found in [118 ] 134] 172] . 

Oversampled Filter Banks Yet another generalization occurs when we allow for re- 
dundancy, leading to overcomplete filter banks implementing frame expansions, covered 
in Chapter \W\ These filter banks are becoming popular in applications due to inherent 
freedom in design. 

Complex-Coefficient Filter Banks This entire chapter dealt exclusively with real- 
coefficient filter banks, due to their prevalence in practice. Complex-coefficient filter banks 
exist, from the very early QMFs [107] to more recent ones, mostly in the form of com- 
plex exponential-modulated local Fourier bases, discussed in Section 8.3] as well as the 
redundant ones, such as Gabor frames [36, 16] [15] [14] [53] , discussed in Chapter [101 

QMF Filter Banks QMF filter banks showed the true potential of filter banks, as it was 
clear that one could have nonideal filters and still split and reconstruct the input spectrum. 
The excitement was further spurred by the famous linear-phase designs by Johnston [81] 
in 1980. Exercise 7.19 discusses derivation of these filters and their properties. 

Time- Varying Filter Banks and Boundary Filters The periodic shift variance of filter 
banks can be exploited to change a filter bank essentially every period. This was done for 
years in audio coding through the so-called MDCT filter banks, discussed in Section [8.4.11 
Herley and Vetterii proposed a more formal approach in [73] , by designing different filters 
to be used at the boundary of a finite- length input, or a filter-bank change. 

Transmultiplexing The dual scheme to a filter bank is known as a transmultiplexer, 
where two sequences are synthesized into a combined sequence from which the two parts 
can be extracted perfectly. An orthogonal decomposition with many channels leads to or- 
thogonal frequency- division multiplexing (OFDM), the basis for many modulation schemes 
used in communications, such as 802.11. The analysis of transmultiplexers uses simi- 
lar tools as for filter banks [165] , covered in Solved Exercise 7.71 for the orthogonal case, 
and in Exercise 7.20 for the biorthogonal case. Exercise 7.21 considers frequency-division 
multiplexing with Haar filters. 

Filter Banks with Stochastic Inputs Among the wealth of filter banks available, it 
is often necessary to determine which one is the most suitable for a given application. A 
number of measures have been proposed, for example, quantifying shift variance of subband 
energies for deterministic inputs. Similarly, in [lj[2], the author proposes, among others, a 
counterpart measure based on the cyclostationarity of subband powers. We do not dwell on 
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these here, rather we leave the discussion for Chapter [jJJJ Akkarakaran and Vaidyanathan 
in [4] discuss bifrequency and bispectrum maps (deterministic and stochastic time-varying 
autocorrelations) in filter banks and answer many relevant questions; some similar issues 
are tackled by Therrien in [148]. In [159J , Vaidyanathan and Akkarakaran give a review 
of optimal filter banks based on input statistics. In particular, principal-component filter 
banks offer optimal solutions to various problems, some of these discussed in [151] and 
[1521 [153] . 



Exercises with Solutions 

7.1. Lowpass Projection in z-Domain 

In ( [7.18] ), the projection property of the lowpass channel in an orthogonal two-channel 
filter bank was shown using operator notation. An alternative is to use z-transforms and 
derive Xy (2) as the output of the lowpass channel. 

(i) Show that the transpose of the convolution operator corresponding to the filter with 

z-transform G(z) is a convolution operator with the filter G(z~ l ). 
(ii) Using the z-domain expression for downsampling followed by upsampling, fl2.193[ ), 

derive Xy(z); since it is the z-domain expression for xy , it is self-adjoint as we have 

proved in the text, 
(iii) Using the above, confirm that if Xy(z) is used as input to the lowpass channel, its 

output is identical, verifying idempotency. 

Solution: 

(i) We use the matrix notation for the convolution operator G in (12.63) and its corre- 
sponding adjoint, G* in (12.651 1. It is clear that the adjoint is just the time-reversed 
version of G. According to (2.137| ), the z-transform of the adjoint of G is then 
G(z- 1 ). 

(ii) Using fl2T93) on G(z" 1 )A(z), 

AV(z) = ^G(z)(G(z- 1 )X(z) + G{-z- 1 )X(-z)). (E7.1-1) 

(iii) We now show idempotency, 

^G(z)[G(z- 1 )X v (z) + G(-z" 1 )X v (-z)] 
= |G(z)[G(2" 1 )(|G(z)(G(z- 1 )X(z) + G(-z- x )X(-z))) + 

Gi-z-^Gi-z^Gi-z-^Xi-z) + G(z" 1 )A(z))))] 
= \G(z)[G{z)G{z- 1 )G{z- 1 )X(z) + G{z)G{z- 1 )G(-z- 1 )X(-z) + 

G(-z)G(-z- 1 )G(-z- 1 )X(-z) + G(-z- 1 )G(-z)G(z- 1 )X(z)] 
= |G(z) \[G{z)G(z- 1 ) + G(-z)G(-z~ 1 )] [Giz-^Xiz) + G(-z" 1 )X(-z)] 

( = } \G{z){G{z- l )X(z) + G{-z- l )X{-z)) ( => AV(z), 

where (a) follows from the orthogonality of the lowpass filter, ( [7.13) ; and (b) from 
the expression for Xy(z) we just derived, (E7.1-1) . 

7.2. Even-Length Requirement for Orthogonality 

Given is an orthogonal FIR filter with impulse response g satisfying 

(Sn, g n -2k)n = 9 k - 
Prove that its length L is necessarily an even integer. 

Solution: We are given an orthogonal, causal FIR filter g of length L. This implies that 
go ^ 0, gL — i 7^ 0. The orthogonality of the filter can be expressed through its deterministic 
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autocorrelation as in ( [7. 15} . Since a n 

A(z) 



(see (j2.61d| )). we can write A(z) as 



■ 9n *n Q- 
L-\ 

2J CLiZ~\ 

i=-(X-l) 

From A(z) + A(—z) = 2 we know that all even-indexed coefficients of A(z) must be zero. 
Since a^-i = a_£+i = gogL-i, and both go ^ and gL-i 7^ 0, then aj,-i = a-j, + i 7^ 0, 
which means that this is an odd-indexed coefficient, or, L is even. 
7.3. Lattice Factorization Design 

Consider the lattice factorization of orthogonal filter banks ( |7.54[ ). 

(i) If U is a rotation matrix as in (]l,220a| ), prove that the sum of angles must satisfy 
( 7.58a) for the lowpass filter to have a zero at z = — 1 (w = 7r). 

(ii) Which condition would the angles have to satisfy in |(i)| if, instead of G(z)\ z=1 = \/2 
as in (|7.56b| ), it had been G(z)| 2 = 1 = — V2 (simple change of phase)? 

(iii) If U is a rotoinversion matrix as in (1.220b)) . prove that the sum of angles must 
satisfy (l7.58b[ ) for the lowpass filter to have a zero at z = — 1 (lu = n). 

(iv) If U is a rotation matrix and K = 2 (leading to length-4 orthogonal filters), impose 
two zeros at z = —1 and show that a possible solution is 9g = ir/3 and 9\ = — 7r/12 
(the same solution as in Example 17.31 one of the filters from the Daubechies family). 

Solution: We use the lattice factorization of orthogonal filter banks (17.54) ). 

(i) To solve this part, we must solve (1 7 . 5 7 1 ) : 



1 




'Go(l)' 
Go(l). 


= 


~V2 
_ 



As G is just the first column of <& p (2 2 )| .._,, we compute it to find: 



* P (* 2 )L 



H n 



R k 



Ro i[ Rk — R, 



cost 
sin 6 



(E7.3-1) 



where R is a rotation matrix with angle 8 = X)fc=o ® k (intuitively, this is clear, as 
a rotation by a succession of angles is equivalent to a rotation by the sum of all 
angles). We thus rewrite (17.57) as 

' V2\ cos 9 + sin 9 = V2, 

J '~ cos 6» -sin 6» = 0, 

the solution of which is (7.58a) ). 
(ii) We repeat the above process, except that the set of equations we must solve is 

cos 9 + sin 9 = — v2, 
cos 9 — sin 9 = 0, 
the solution of which is ( ]7.58b| ). 
(iii) If U is a rotoinversion matrix, the total angle is 9 = 9q — 52fc^i ®k since 



cos&o 
sin 6*o 



sm&o 
■ cos 6>o 



cos&i 
sin6>i 



- sin&i 
cos6>i 



cos(&o 
sin(6'o - 



sin(6'o 

- 008(00 



that is, a rotation followed by rotoinversion is a rotoinversion again. Solving (E7.3-l| ) 
gives the desired result, 
(iv) We now want to impose two zeros at 2 = — 1. We compute 

'1 



*p(z) = Ro 







Ri 



cos 6*0 cos 6*i — sin 6*o sin 6*i 2 
sin 6>o cos $1 + cos 6*o sin 9\z 



■ cos 9q sin 9i 
- sin 6*o sin 9\ ■ 



- sin^o cos 0i2 x 
■ cos #0 cos6*i2 _1 
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Using CL32) and ([7733) . we find 

G(z) = cos So cos Si + sin So cos Si z~ — sinSo sinSiz - + cos So sin Si z~ . 

The first zero at z = — 1 we have already solved for; we know that So + Si = tt/4. 
We now solve for an additional zero by differentiating in the Fourier domain: 

= sin So cos Si + 2 sin So sin Si + 3 cos So sin Si = 0. 

Using trigonometric identities, we can rewrite this as: 

1 (sin(S + Si) + sin(S - Si) - 2cos(S + Si) + 2cos(S - Si) 
+ 3sin(S + Si) -3sin(S -Si)) 



leading to 



— -sin(S -Si) + cos(S -Si) = 0, 
v2 



cos(2S - -) - sin(2S - -) = =. 



Using again trigonometric identities, we get that 2So = 27r/3, or, So = 7r/3 and 
Si = -vr/12. 

7.4. Polynomial Reproduction in Perfect Reconstruction Biorthogonal Filter Banks 
Given is a biorthogonal filter bank with a synthesis lowpass filter of the form 

G(z) = (l + z- 1 ) N R(z). (E7.4-1) 

Prove that in such a filter bank: 

(i) The analysis highpass filter H (z) has (JV — 1) zero moments. 

(ii) Polynomials of degree up to (JV — 1) belong to the space V = spa,n({g n _ 2 k}k£z), 
that is, g and its even shifts can reproduce polynomials up to degree (JV — 1). 

Solution: This solution builds up on the solution of Exercise 7.61 



(i) Given the synthesis filter G(z) from ( |E7.4-1| ). from (|7.75b[ ), we know that the analysis 
highpass filter H(z) can be computed as 

H(z) = -z L ~ 1 G(-z) = -z L - 1 {l-z- 1 ) N R{-z) = -z{l-z- 1 ) N R(-z). 

where we arbitrarily set L = 2. Following Exercise 17.61 the above filter has (N — 1) 
zero moments, 
(ii) Since H has JV zeros at z = 1, the highpass channel of the perfect reconstruction 
filter bank outputs a zero when presented with a polynomial of degree (N — 1) 
as input. Since we have the perfection reconstruction property, we know that the 
output must then be wholly preserved in the lowpass channel. Thus, polynomials 
up to degree (N — 1) belong to the space V = span({<?„_2fc}fc e z). 

7.5. Linear Phase Filter Banks 

A linear phase two-channel perfect reconstruction filter bank has a 4-tap symmetric syn- 
thesis lowpass filter G(z) with a zero at lo = 2iz/3. 

(i) Within a scaling factor, what is G(z)7 

(ii) Prove that the analysis lowpass filter G(z) cannot be a 2-tap filter, 
(iii) Within a scaling factor, what is G{z)l (You may assume that H(z) is of shortest 
possible length and that the delay is such so as to make G(z) causal.) 

Solution: 

(i) The filter is of the form 

G(z) = a + bz' 1 + bz~ 2 +az~ 3 = a(l + z~ 3 ) + b{z _1 + z~ 2 ). 
We are given that the filter has a zero at uj = 2-7r/3. Thus 

Gf( e i2ir/3) = o(l + l) + 6(-l/2-i's/3/2-l/2 + i%/3/2) = 2a - b = 0, 
leading to b = 2a and thus 

G(z) = a(l + 2z~ 1 + 2z~ 2 +z~ 3 ). 
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(ii) According to Proposition 7.111 since we have an even-length, symmetric filter, the 
other filter will be antisymmetric and of even length, differing by an even multiple 
of 2. Thus, H(z) can be of length 4, 8, 12, ... Using (17.75a) we get that 

G(z) = z~ k H(-z), 

and thus, G(z) can only be of length 4, 8, 12, . . . 
(iii) Since H (z) is antisymmetric and of length 4, we get 

H(z) = c + dz' 1 -dz~ 2 -cz~ 3 = c(l- z- 3 ) + d(z~ 1 - z~ 2 ). 

Writing the determinant of the analysis polyphase matrix and forcing it to be a 
delay 

{d-2c)(l + z- 2 ) + 2{2d-c)z- 1 = pz~ l , 

leading to d = 2c. Therefore, 

G(z) = z~ k H{-z) = z- k c(l-2z~ l -2z~ 2 + z~ 3 ), 
and choosing k = to leave G causal 

G{z) = c(l -2z _1 -2z~ 2 + z~ 3 ). 

7.6. Filter Design 
Let 

A{z) = -Lil + zfa + z-'fi-z + A-z- 1 ). 
16 

(i) Find the roots of A(z). 

(ii) Based on the factorization of this polynomial, find all the possible nontrivial orthog- 
onal filters. Verify that your solutions are power complementary. 

(iii) Based on the factorization of this polynomial, construct all possible nontrivial lin- 
ear phase (having a center of (anti-)symmetry) biorthogonal filters satisfying the 
constraint -ff(l) = 1. 

(iv) Using Matlab, plot the magnitude responses of all of the filters obtained in (ii) and 
(iii) and compare their shape. 

Solution: 

(i) A(z) has 4 roots at z = — 1 and 2 roots at at 2 + \/3 and 2 — ^/3. We can therefore 
factor A(z) as 

A W = — -^-(l + ^) 2 (l + ^- 1 ) 2 (l-(2 + v / (3))2)(l-(2 + v / (3)) 2 - 1 ). 
16(2 + V3) 

(ii) For the filters G(z) to be orthonormal, we need to have A(z) = G(z)G(z~ 1 ). Ta- 
ble [E7J6ZD lists possible choices. 

(iii) To have a pair of biorthogonal filters, we need to find filters G(z) and G(z) such 
that A(z) = G(z)G{z). Additionally, the filters need to be linear phase and have 
G(l) = 1. Table [E7.6-2 lists possible choices. Note that each G(z) can be multiplied 
by (— z + 4 — z~ 1 )/2, as long as the corresponding G(z) is divided by the same 
(— z + 4 — z~ 1 )/2. More generally, each G(z) can be multiplied by any arbitrary 
polynomial A(z), with A(l) = 1, as long as the corresponding G(z) is divided by 
the same A(z). Note that in most cases this will lead to an IIR filter G(z). 

7.7. Two-Channel Transmultiplexer 

A transmultiplexer is a two-channel filter bank with a reversed order of analysis and 
synthesis parts, as shown in Figure E7.7-1I If the inputs into the synthesis filter bank 
are ct(z) and /3(z), and the output of the synthesis bank is X(z), find the input-output 
relationship of the transmultiplexer and comment. 

Solution: Given that we know a sequence can be split into two bands and perfectly 
reconstructed, can we do the converse? Intuitively, it seems that such a scheme should work, 
and indeed it does, as we show shortly. Beyond the algebraic result, transmultiplexers are of 
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G(z) 



±1 



4v / 2+~73 

±1 

4v / 2+~\7I 

±1 



4V2+7I 

±1 

4-v/2 + n/3 

±1 

4 V / 2 + \73 

±1 



4v / 2+~73 



(l + 2 )(l + z- 1 )(l-(2 + v / (3)) Z ) 
(l + 2 )(l + 2 - 1 )(l-(2 + v / (3)) 2 - 1 ) 
(l + z ) 2 (l-(2 + v / (3)) 2 ) 
(l + Z ) 2 (l-(2 + v / (3)) Z - 1 ) 
(l + Z - 1 ) 2 (l-(2 + v / (3)) 2 ) 
(l + 2 - 1 ) 2 (l-(2 + v / (3)) Z - 1 ) 



Table E7.6-1: Possible choices for the orthogonal filter G(z) in Exercise 17.61 



G(z) 



G(z) 



I" 



rd 



16 



(1 



-z) 

-zf 

- 2 )(1 + Z - 1 ) 
-zfil + z- 1 ) 

- z )(l + z -l) 2 

+ zf(l + z- 1 f 



16(2 + V3) 
1 



8v / 2+~73 
1 

8V2~+\73 
1 



4V2+7I 
1 



4 V / 2~+vl 
1 



4 V / 2~W3 
1 



4v / 2+~73 
1 



AV2 + V3 
1 

4 V / 2 + \73 



(1 + zf{l + Z " 1 ) 2 (l - (2 + V / (3)) Z )(1 - (2 + v^S))^ 1 ) 
(1 + 2 )(1 + 2 " 1 ) 2 (1 - (2 + V / (3)) Z )(1 - (2 + v^S))," 1 ) 
(1 + *) 2 (1 + z-^l - (2 + ^3))*)(1 - (2 + v 7 ^- 1 ) 
(1 + Z " 1 ) 2 (l - (2 + V / (3)) Z )(1 - (2 + v^S))^ 1 ) 
(1 + 2 ) 2 (1 - (2 + s/l3))z)(l - (2 + Vl^z- 1 ) 
(1 + *)(1 + a- 1 )^ - (2 + v / (3)) 2 )(l - (2 + v^S))^ 1 ) 
(1 + OU - (2 + v / (3)) 2 )(l - (2 + v^S))*" 1 ) 
(1 + *)(1 - (2 + v / (3)) 2 )(l - (2 + v^S))^ 1 ) 
(l-(2 + v / (3)) 2 )(l-(2 + v / (3)) Z - 1 ) 



Table E7.6-2: Possible choices for the biorthogonal filter pair (G(z),G(z)) in 
Exercise 17.61 



great importance in practice, since they form the basis for frequency division multiplexing 
as we show as well. An orthogonal decomposition with many channels leads to orthogonal 
frequency-division multiplexing (OFDM), the basis for many modulation schemes used in 
communications, such as 802.11. 
We start with 



X(z) = [G(z) H(z)} 



0{z 2 ) 



(E7.7-1) 
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P @- h n 



-@- 




h- n 



9-n 



P 



\2) » a 



Figure E7.7-1: Transmultiplexer synthesizes two-channel sequences to a single 
upsampled sequence, and then splits them again. 



The output of the analysis filter bank is 



G{z 
H(z 



■1/2) 
-1/2) 



G(- 

H(- 



-1/2)' 
-1/2) 



X^i/2) ' 
X(- Z 1 /2) 



(E7.7-2) 



and we want this to be identical to the inputs ct(z) and f3(z). Substituting (|E7.7-1| I into 
this, we see that the input-output relationship is given by the following matrix product 
(where we formally replaced z 1 ' 2 by z): 



G(z^) G(-z-i) 
H(z~ l ) H(-z- v ) 



G(z) H(z) ' 
G{-z) H(-z) 



For this to be identity, we require 

G(z)G{z~ 1 ) + G(-z)G(-z- 1 ) = 2, 
H(z)H(z- 1 ) + H(-z)H(-z- 1 ) = 2, 
G(z)H(z- 1 ) + G(-z)H(-z~ 1 ) = 
H(z)G{z~ 1 ) + H(-z)G(-z~ 1 ) = 0, 



(E7.7-3) 



(E7.7-4a) 
(E7.7-4b) 

(E7.7-4c) 
(E7.7-4d) 



Of course, the above relations are satisfied if and only if G(z) and H(z) are orthogonal 
filters, since ( |E7,7-4a| ) and (|E7,7-4b[ ) are orthogonality relations of the filters' impulse 
responses with respect to their even translates as in (I7.13) and (|7.14| , respectively, while 
( |E7.7-4c| ) and ( |E7.7-4d| ) are the orthogonality relation of {g n } and {h n _ 2 k} as in (17.22[ ). 

Therefore, if we have a two-channel orthogonal filter bank, it does not matter if we 
cascade analysis followed by synthesis, or synthesis followed by analysis — both will lead 
to perfect reconstruction systems. The same applies to biorthogonal filter banks, left for 
Exercise 7.201 An intuitive way to understand the result is to recall that when matrices 
are square, then the left inverse is also the right inverse. 

We now give some geometrical perspective. The output of the synthesis filter bank is 
the sum of two sequences xy and xy/ from V = spa,n({g n _ 2 k}k€Z.) an d W = s P an -{{hn-2k}k€z)y 
respectively. Because of the orthogonality of g n and h n , 

V±W. 



Exercises 

7.1. FIR Filter Bank 

For the system specified by the input-output relation 

y = GU 2 D 2 Gx + HU2D2&X, 

(i) Write the ^-transform of y n and e : > 7rn y n using matrix notation, that is, specify 

[Y(z) Y(-z)] T . 
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(ii) Assuming that perfect reconstruction can be achieved by this system, show that 
knowing the analysis filters G, H we can specify the synthesis filters by 



G(z) 



■2z 



det(# m (z)) 



H(-z) and H(z) 



22 



det($ m (z)) 



G(-z), 



where I is any integer and $ m (2) is called a modulation matrix, a matrix of analysis 
filters, G(z) and H(z), and their modulated versions, G(—z) and H(—z). Specify 
<& m (2) and its determinant. 

(iii) If the analysis filters G and H are FIR, find a condition that will make the synthesis 
filters, G and H, FIR also. Write the 2-transform of these synthesis niters as a 
function of the analysis niters. 

7.2. Aliasing Cancelation 

For the system of Exercise 7.11 first find the condition required to cancel all aliasing in the 
output signal, that is, for the term with X{— 2) to be 0. Then, show that a solution of this 
condition is given by 



H(z) = 2- 1 G(-2" 1 ), G(z) = G(2" 1 ), H{z) = zG{-z), 



(P7.2-1) 



as proposed by Smith and Barnwell [135]. Finally, using this solution, find the condition 
for perfect reconstruction as a function of G(z) in the Fourier domain. 

7.3. Multirate Filtering 

For the system in Figure P7.3-11 find the expressions in both the DTFT and 2-transform 
domains at each point in the system. Draw the corresponding spectra assuming that g is 
an ideal half-band lowpass filter and that x has the spectrum as in Figure P7.3-2I 



-@ @~ \ gjj 



Figure P7.3-1: Multirate system in Exercise 17.31 




7C/3 27t/3 n 



Figure P7.3-2: Spectrum of the input for the multirate system in Exercise 17.31 



7.4. Rudin- Shapiro Polynomial 

Rudin-Shapiro polynomials are defined by the following recursive equations: 

Po(z) = QoO) = l 

Pn + l(z) = P n (z) + 2 2 '" Q n (z) 
Qn + l(z) = P n (z) - Z 2 '" Q n {z) 

(i) Derive the Rudin-Shapiro polynomial pair (P, Q) of degree 15. 



(P7.4-1) 
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(ii) Prove that for n > 0, P n and Q n lead to a two-channel perfect reconstruction 
orthogonal filter bank, that is, show the following: 

Pn(z)P n (z- 1 ) + Q n (z)Q n (z- 1 ) = k„, 
Pn(z)P n (-z- 1 ) + Q n (z)Q n {-z- 1 ) = 0. 

Determine the constant k n . 
(iii) Prove 

\Pn(e^)\ 2 + \Q n (e^)\ 2 = 2 n+1 . (P7.4-2) 



(iv) Prove 

E\ /'„i. '■•■):' I 



e[|p„(^)| 2 ] 1 



max,, \P n (e^)\ 2 "" 2' 

where E |P n (e J1 " ; )| denotes the average value of |P„(e : '' J )| over one period of 
length 2-7r. 

7.5. Sequence Approximation 

You are asked to approximate a sequence x n with a sequence y n having the following form: 

Vn = y^,Pk<l>n-2k, 
fcez 

where n is a length-4 sequence <f> = \. . . 1 3 3 —1 ... /2\/5, and f3 n is 

some sequence of coefficients. The DTFT of the original sequence x n is X(e : >"), and the 
DTFT of your approximation y n is Y(e J "). 

(i) Draw a block-diagram of a system having /3 n as input and y n as output. Explain 
what the system does. 

(ii) We can compute f} n = (x2n +3x2n+i +3x2n+2 — £2n+3)/2\/5. Write an expression 

for Y{e^) in terms of X(e^). Draw a block-diagram of a system having x n as 

input and y n as output; explain what that system does, 
(iii) You wish to find a sequence z n which, when added to y n exactly recovers x n , that 

is, y n + z n = x n - For the coefficients /3 n in | (ii) [ the sequence z n has the same form 

as y n , that is, 

Zn = y^Qfc7n-2fc- 

feez 
Specify the required 7„ sequence, and give an expression for the coefficients ct n in 
terms of x n . 

7.6. Zero-Moment Property of Highpass Filters 

Verify that a filter with the ^-transform H(z) = (1 — z~ l ) N R(z) has its first (N — 1) 
moments zero, as in (7.471. 



7.7. Reproduction of Sinusoids 

Consider the proof of Theorem 17.51 about reproduction of polynomials. 

(i) Modify the argument so that complex sinusoids of frequency ojo are reproduced by 

the lowpass channel, 
(ii) Extend the above to real sinusoids of frequency wq. 

7.8. Orthogonal Filters Are Maximally Flat 

We consider the design of an orthogonal filter with N zeros at z = — 1. Its deterministic 
autocorrelation is of the form A(z) = (I + z) (1 + z _1 ) N Q(z) as in (7.52) ) and satisfies 
( [733] ). 

(i) Verify that A(e^) can be written as 

A{e^) = 2 N {l + cosuj) N Q(e ju >). 

(ii) Show that A(e : ' u ') and its (2iV — 1) derivatives are zero at uj = ir, and show that 
j4(e JW ) has (2JV — 1) zero derivatives at cu = 0. 
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(iii) Show that the previous result leads to |G(e J ")| being maximally flat at w = and 
u) = 7r, that is, |G(e- ,u )| has (N — 1) zero derivatives. 

7.9. Spectral Factorization 

For C{z) = (1 + z) 3 (l + z" 1 ) 3 , verify that D(z) = (3z 2 - I8z + 38 - 18Z" 1 + 3z" 2 )/256 
is such that A(z) = C(z)D(z) is a valid deterministic autocorrelation. Perform spectral 
factorization and find the filters of this orthogonal filter bank. 

7.10. Orthogonal Filter Bank Design 

Given is a two-channel FIR filter bank with real coefficients. 

(i) Let G(z) = /3(1 + az) with a £ No and /3 € R. Give the relation between a and 
/3 such that the filter leads to an orthogonal perfect reconstruction filter bank with 
real coefficients, 
(ii) Now let G(z) = -^(1 + 2z) 2 . Does this filter lead to an orthogonal perfect recon- 
struction FIR filter bank with real coefficients? 
(iii) Let 

l + 2z\ 2 /1 + 2Z" 1 ' ~ 



If A(z) is the z-transform of a deterministic autocorrelation of a filter, find the 
shortest polynomial R(z) such that the associated filter bank leads to perfect recon- 
struction. Justify why R(z) cannot be a constant, 
(iv) Using the expression for A(z), find a set of four filters (analysis and synthesis) leading 
to an orthogonal two-channel perfect reconstruction filter bank. 

7.11. Infinite- Dimensional Bases 
Let 

a„ = S n + 3<5 n _i + 3<5n_2 + S n -3, 
b n = S n + 35 n _i — 3i5„_2 — 5 n -3. 

The set $ = {a n _ 2 k, K-2k}k€Z is a basis for £ 2 (Z). 
(i) Is $ an orthonormal basis? Why? 

(ii) If yes, demonstrate the Parseval's equality. If no, use Gram-Schmidt orthogonaliza- 
tion to obtain an orthonormal basis for the span of <!?. 

7.12. Complementary Filters 

Using Proposition 17.81 prove that G(z) = (1 + z~ l ) N always has a complementary H(z). 

7.13. Interpolation Followed by Decimation 

Given is the following system: an input x is upsampled by 2, followed by interpolation 
with a filter with a z-transformG(z) for magnification of the signal. Then, to recover the 
original signal size, the signal is filtered by G(z) followed by downsampling by 2, to obtain 
a reconstruction x. 

(i) What does the product filter C(z) = G(z)G(z) have to satisfy for x to be a perfect 
replica of x (possibly with a shift)? 

(ii) Given G(z), what condition does it have to satisfy so that one can find G(z) achieving 

perfect reconstruction? 
(iii) For the following two filters, 

G'{z) = 1 + z" 1 +z~ 2 + z" 3 , G"(z) = 1 + z" 1 +z~ 2 + z~ 3 + z" 4 , 

give filters G'(z) and G"(z) so that perfect reconstruction is achieved (if possible, 
give shortest such filters; if not, say why). 

7.14. Structure of Linear- Phase Solutions 

Prove the three assertions in Proposition 1 7. 1 U on the structure of linear phase solutions. 
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7.15. Linear Phase Testing Condition 

Show that, when the filters G(z) and H(z) are of the same length and linear phase, the 
following holds: 



**>(*) 



1 
-1 



* P (z -1 ) 



1 
1 



(P7.15-1) 



(Hint: Find out the form of the polyphase components of each linear phase filter.) 

7.16. Complex Linear Phase Orthogonal Solutions 

Proposition 7. 1 2] states that there are no real linear-phase orthogonal FIR filter banks. 

(i) Show that if the filter coefficients can be complex valued, then solutions exist, 
(ii) For length-6 filters, find the solution with a maximum numbers of zeros at u) = 7r. 
(Hint: Refactor the A(z) that leads to the D3 filter into complex-valued symmet- 
ric/antisymmetric filters.) 

7.17. Spectral Factorization Method For Two-Channel Filter Banks 

Consider the factorization of P(z) in order to obtain orthogonal or biorthogonal filter 
banks. 

(i) Let 

1 q 1 1 , 1 „ 

A(z) = --z 3 + -z + 1 + -z z 

w 4 2 2 4 

Build an orthogonal filter bank based on this A(z). If the function is not positive 

on the unit circle, apply an adequate correction as in Section 7.3.11 

(Hint: This correction, due to Smith and Barnwell, is applied by finding e = 

minA(e 3 ' w ). Then take A'(z) = (A(z) - e)/(l - e). A'(z) should now satisfy the 

requirements A'(z) + A'(-z) = 2 and A(e iu >) > 0.) 

(ii) Alternatively, compute a linear phase factorization of A(z). In particular, choose 
G(z) = z + 1 + z~ x . Give the other filters in this biorthogonal filter bank. 

(iii) Assume now that a particular A(z) was designed using the Parks-McClellan al- 
gorithm (which leads to equiripple pass and stopbands). Show that if A(z) is not 
positive on the unit circle, then the correction to make it positive places all stopband 
zeros on the unit circle. 

7.18. Filter Bank Design Using Lifting 

In a two-channel perfect reconstruction filter bank, FIR solutions are possible if and only 
if the polyphase matrix of the synthesis bank & p (z) is such that its determinant is a pure 
delay, that is, det$ p (z) = z~ k . 

(i) Prove that if & p (z) meets the above property, then so does &' p (z) = & p (z)L(z) where 



L(z) 



1 R(z) 
1 



(ii) Give the corresponding diagram for the polyphase implementation of the new syn- 
thesis filter bank. What are the corresponding filters G'(z) and H'(z)l 

(iii) Assume R(z) is an FIR filter of length L with ro ^ and r^-i 7^ 0, L > 1. Assume 
also that the initial filter bank was the Haar orthogonal filter bank. Is the resulting 
filter bank obtained after lifting orthogonal? In general, if & p (z) is any paraunitary 
matrix, will the resulting filter bank be orthogonal? Justify your answer. 

7.19. Quadrature Mirror Filters 

In this exercise, we explore properties of QMF filter banks. 

(i) Show that the following choice of filters: 

analysis G(z) = G(z) H(z) = G(-z) 
synthesis G(z) H(z) = -G(-z) 

where G(z) is a linear-phase FIR filter, automatically cancels aliasing in the output, 
(ii) Give the input-output relationship. 
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(iii) Explain why QMF FIR filter banks cannot achieve perfect reconstruction. 

7.20. Biorthogonal Transmultiplexer 

Show that a perfect reconstruction analysis-synthesis filter bank is also a perfect recon- 
struction synthesis-analysis filter bank, by generalizing Solved Exercise |7.7| to biorthogonal 
filter banks. 

7.21. Frequency- Division Multiplexing with Haar Filters 

Given is a transmultiplexer with Haar filters as in Table 7.81 

(i) Characterize explicitly the spaces V and W, and show that they are orthogonal, 
(ii) Give two example sequences from V and W, as well as their sum. 
(iii) Verify explicitly the perfect reconstruction property, either by writing the ^-transform 
relations or the matrix operators (which is more intuitive and thus preferred). 
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Think of a piece of music: notes appear at different instants of time, and then 
fade away. These are short-time frequency events the human ear identifies easily, 
but are a challenge for a computer to understand. These notes are well identified 
frequencies, but they are short lived. Thus, we would like to have access to a 
local Fourier transform, that is, a time- frequency analysis tool that understands the 
spectrum locally in time. While such a transform is known under many names, such 
as windowed Fourier transform, Gabor transform and short-time Fourier transform, 
we will use local Fourier transform exclusively throughout the manuscript. The 
local energy distribution over frequency, which can be obtained by squaring the 
magnitude of the local Fourier coefficients, is called the spectrogram, and is widely 
used in speech processing and time-series analysis. 

Our purpose in this chapter is to explore what is possible in terms of obtaining 
such a local version of the Fourier transform of a sequence. While, unfortunately, we 
will see that, apart from short ones, there exist no good longer local Fourier bases, 
there exist good local Fourier frames, the topic we explore in Chapter [10} Moreover, 
there exist good local cosine bases, where the complex-exponential modulation is 

631 
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replaced by cosine modulation. These constructions will all be implemented using 
general, TV-channel filter banks, the first generalization of the basic two-channel 
filter bank block we just saw in the last chapter. 

8.1 Introduction 

We now look at the simplest example of a local Fourier transform decomposing 
the spectrum into TV equal parts. As we have learned in the previous chapter, for 
TV = 2, two-channel filter banks do the trick; for a general TV, it is no surprise that 
TV-channel filter banks perform that role, and we now show just that. 

If we have an infinite-length sequence, we could use the DTFT we discussed 
in Chapter \2\ however, as we mentioned earlier, this representation will erase any 
time-local information present in the sequence. We could, however, use another tool 
also discussed in Chapter \2\ the DFT. While we have said that the DFT is a natural 
tool for the analysis of either periodic sequences or infinite-length sequences with 
a finite number of nonzero samples, circularly extended, we can also use the DFT 
as a tool to observe the local behavior of an infinite-length sequence by dividing it 
into pieces of length TV", followed by a length- TV DFT. 

Implementing a Length-TV DFT Basis Expansion 

We now mimic what we have done for the Haar basis in the previous chapter, that 
is, implement the DFT basis using signal processing machinery. We start with the 
basis view of the DFT from Section 12.6.11 we assume this finite-dimensional basis 
is applied to length- TV pieces of our input sequence. The final basis then consists of 
{fili^o from ( 12. 160J ) and all their shifts by integer multiples of TV, that is, 

$DFT = {<Pi,n-Nk}i&{0,l,...,N-l},keZ- (8- 1 ) 

In other words, as opposed to two template basis sequences generating the entire 
basis by shifting as in the Haar case, not surprisingly, we now have TV template 
basis sequences generating the entire basis by shifting. We rename those template 
basis sequences to (we use the normalized version of the DFT): 

9i,n = <Pi,n = -4^'" ( 8 ' 2 ) 

V TV 



This is again done both for simplicity, as well as because it is the standard way 
these sequences are denoted. 

Then, we rewrite the reconstruction formula ( |2.159bf ) as 



N-l 
j=0 fcGZ 




-Nk)n 


<Pi,n-Nk 






N-l 
8=0 fegZ 


,k ^Pi,n— 


Nk = 


N-l 

-- EE a > 

t=o feez 


,k9i,n- 


-Nk, 



.3a) 
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Figure 8.1: An iV-channel analysis/synthesis filter bank. 



where we have renamed the basis sequences as explained above, as well as denoted 
the expansion coefficients as 



\ x n, l -Pi,n-Nk)? 



l^rti 9i 



-Nk) 



OLih- 



(8.3b) 



As for Haar, we recognize each sum in ( ]8.3a[ ) as the output of upsampling by N fol- 
lowed by filtering (( J2.198J ) for upsampling factor N) with the input sequences being 
Qi.fe. Thus, each sum in ( |8.3a,| ) can be implemented as the input sequence on going 
through an upsampler by N followed by filtering by gi (right side in Figure 18.11 ) . 

By the same token, we can identify the computation of the expansion coeffi- 
cients oti in ( ]8.3b) as ( 12.195) for downsampling factor N, that is, filtering by gi- n 
followed by downsampling by N (left side in Figure 18.1) . 

We can now merge the above operations to yield an N -channel filter bank 
implementing a DFT orthonormal basis expansion as in Figure 18.11 Part on the 
left, which computes the projection coefficients, is termed an analysis filter bank, 
while the part on the right, which computes the actual projections, is termed a 
synthesis filter bank. 

As before, once we have identified all the appropriate multirate components, 
we can examine the DFT filter bank via matrix operations. For example, in matrix 
notation, the analysis process ( |8.3bj ) can be expressed as 



«0,0 


OiN-1,0 


«0,1 


ajv-i,i 















X 






XN-l 




X N 




X2N-1 













(8.4a) 
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with F as in ( 12.161a) , and the synthesis process ( 18. 3a) as 



Xq 



p* 



Xn-1 

Xn 

X2N-1 



Of course, $ is a unitary matrix, since F/\N is 



"o.o 



Qtjv-1,0 

"0,1 
O-N-IA 



(8.4b) 



Localization Properties of the Length-iV DFT It is quite clear that the time 
localization properties of the DFT are superior to those of the DTFT, as now, we 
have access to the time-local events at the resolution of length N . However, as a 
result, the frequency resolution must necessarily worsen; to see this, consider the 
frequency response of 50 (the other f^s are modulated versions and therefore have 
the same frequency resolution): 



Go(e^) 



sine {ujN/2) 
sine (w/2) 



•5) 



that is, it is the DTFT of a box sequence (see Table [376] ) . It has zeros at u> = 2irk/N , 
k = 1, 2, . . . , N — 1, but decays slowly in between. 

The orthonormal basis given by the DFT is just one of many basis options im- 
plementable by TV-channel filter banks; many others, with template basis sequences 
with more than N nonzero samples are possible (similarly to the two-channel case). 
The DFT is a local Fourier version as the time events can be captured with the 
resolution of N samples. 

Chapter Outline 

This short introduction leads naturally to the following structure of the chapter: 
In Section 18.21 we give an overview of Af-channel filter banks. In Section 18.31 we 
present the local Fourier bases implementable by complex exponential-modulated 
filter banks. We then come to the crucial, albeit negative result: the Balian-Low 
theorem, which states the impossibility of good complex exponential-modulated 
local Fourier bases. We look into their applications: local power spectral density via 
periodograms, as well as in transmultiplexing. To mitigate the Balian-Low negative 
result, Section 18.41 considers what happens if we use cosine modulation instead 
of the complex one to obtain a local frequency analysis. In the block-transform 
case, we encounter the discrete cosine transform, which plays a prominent role in 
image processing. In the sliding window case, a cosine-modulated filter bank allows 
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the best of both worlds, namely an orthonormal basis with good time-frequency 
localization. We also discuss variations on this construction as well as an application 
to audio compression. 

Notation used in this chapter: Unlike in the previous chapter, in this one, complex- 
coefficient filter banks are the norm. Thus, Hermitian transposition is used often, 
with the caveat that only coefficients should be conjugated and not z. We will point 
these out throughout the chapter. D 



8.2 TV-Channel Filter Banks 

We could imagine achieving our goal of splicing the spectrum into TV pieces many 
ways; we have just seen one, achievable by using the DFT, a representation with 
reasonable time but poor frequency localization. Another option is using an ideal 
TVth band filter and its shifts (we have seen it in Tables 12.51 and 13.61 as well as 
( 12 . 107[ ) with ujq = 2tt/N, but repeat it here for completeness): 



1 . , ,, n DTFT „ , ju . J s/N, \U>\ < 7r/TV; ( . 

g , n = -^smc^n/N) — G (e> ) = i ^ M^ (8.6) 



which clearly has perfect frequency localization but poor time localization as its 
impulse response is a discrete sine sequence. We have discussed this trade-off already 
in Chapter \6\ and depict it in Figure 18.21 

The question now is whether there exist constructions in between these two 
extreme cases? Specifically, are there basis sequences with better frequency local- 
ization than the block transform, but with impulse responses that decay faster than 
the sine impulse response (for example, a finite impulse response)? 

To explore this issue, we introduce general TV-channel filter banks. These are as 
shown in Figure [87TJ where the input is analyzed by TV filters gi, i = 0, 1, ..., TV — 1, 
and downsampled by TV. The synthesis is done by upsampling by TV, followed by 
interpolation with gi, i = 0, 1, . . . , N — 1. 

The analysis of TV-channel filter banks can be done in complete analogy to the 
two-channel case, by using the relevant equations for sampling rate changes by 7V. 
We now state these without proofs, and illustrate them on a particular case of a 
3-channel filter bank, especially in polyphase domain. 

8.2.1 Orthogonal TV-Channel Filter Banks 

As for two-channel filter banks, TV-channel orthogonal filter banks are of particular 
interest; the DFT is one example. We now briefly follow the path from the previous 
chapter and put in one place the relations governing such filter banks. The biorthog- 
onal ones follow similarly, and we just touch upon them during the discussion of 
the polyphase view. 

Orthogonality of a Single Filter Since we started with an orthonormal basis, the 
set {gi,n-Nk}k£Z,ie{0,i,~.,N— 1} is an orthonormal set. We have seen in Section [2. 7. 5 
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1U-- 



(a) 



(c) 



(d) 



2 



10 



2-7T 




Figure 8.2: Time- and frequency-domain behaviors of two orthonormal bases with N — 8 
channels, (a)-(b) Sine basis, (a) Impulse response is a sine sequence, with poor time 
localization, (b) Frequency response is a box function, with perfect frequency localization. 
(c)-(d) DFT basis, (c) Impulse response is a box sequence, with good time localization, 
(d) Frequency response is a sine function, with poor frequency localization. 



that each such filter is orthogonal and satisfies, analogously to ( |2.209[ ): 

-DatG; GiUn = 



\9i,m 9i 



-Nk 



Matrix View 

ZT 

DTFT 






i 

N 
N 



(8.7) 



As before, the matrix view expresses the fact that the columns of GiUpi form an 
orthonormal set and the DTFT version is a generalization of the quadrature mirror 
formula ( 12.208) . For example, take go and N = 3. The DTFT version is then 



G (e*")| 2 + G (e j(u; - 2w/N) ) + G {eP {uJ -^ /N) ) 



essentially, the magnitude response squared of the filter, added to its modulated 
versions by 2ir/N and Att/N, sum up to a constant. This is easily seen in the case 
of an ideal third-band filter, whose frequency response would be constant \/3 (see 
(8.6) ), and thus squared and shifted across the spectrum would satisfy the above. 

Deterministic Autocorrelation of a Single Filter With a, the deterministic au- 
tocorrelation of <7j, the deterministic autocorrelation version of ( 18.7) is straightfor- 
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ward: 

Matrix View ~ . TT T 

< — > D N AiU N = I 

(9i,n, 9i,n-Nk) = a^ Nk = 5 k ^L, Hk=0 M W nZ) = N 

DTFT E ^-l Me3{ ^^ /N)k)) = N 

(8.8) 

Orthogonal Projection Property a Single Channel Analogously to two channels, 
a single channel with orthogonal filters projects onto a coarse subspace Vq or detail 
subspaces Wi, i = 1, 2, . . . , N — 1, depending on the frequency properties of the 
filter. Each of the orthogonal projection operators is given as 

Pv = GqUn DnG , 

P Wi = G t U N D N Gf, i = l,2,...,N-l, 

with the range 

V = sp&n({g ,n-Nk}k£i,), 
Wi = span({p iin _ Nfc } feeZ ), i = 1,2, ...,7V- 1. 

Orthogonality of Filters As in the previous chapter, once the orthonormality of 
each single channel is established, what is left is the orthogonality of the channels 
among themselves. Again, all the expressions are analogous to the two-channel case; 
we state them here without proof. We assume below that i =£ j. 

Mat ^T icw D N GJG t U N = 

(9i,n, 9j,n-Nk) = «H> TkJ G l {W%z)G J {W^ k z- 1 ) = 

DTFT Y, N - 1 G i {ei^- ( - 27 </ N ') k y)G-{e-i^ +i -' 27T / N ^) = 

(8.9) 

Deterministic Crosscorrelation of Filters Calling Cj,j the deterministic crosscor- 
relation of <?, and gf 

Matrix View r\ s~* tt r\ 

< > L>NL,i,jUM = I) 

(gi,n, 9j,n-Nk) = Ci,i,Nk = «H» T,k=0 C iA W N^) = 

DTFT E ^_l ^ {ej( ^ (27T/N)k) ) = Q 

(8.10) 

8.2.2 Polyphase View of iV-Channel Filter Banks 

To cover the polyphase view for general N, we cover it through an example with 
N = 3, and then briefly summarize the discussion for a general N. 
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Example 8.1 (Orthogonal 3-channel filter banks) For two-channel fil- 
ter banks, a polyphase decomposition is achieved by simply splitting both se- 
quences and filters into their even- and odd-indexed subsequences; for 3-channel 
filter banks, we split sequences and filters into subsequences modulo 3. While 
we have seen the expression for a polyphase representation of a sequence and 
filters for a general N in ( 12.222) , we write them out for TV = 3 to develop some 
intuition, starting with the input sequence x: 



Xq n — X 3n 



ZT 



Xq(z) = 2J x 3n Z 
ra£Z 

Xl,n = X 3n+ i < > Xi(z) = } X 3n+ l . 



«e: 



X2,: 



X 3n +2 < ' X 2 (z) = } y X 3n+2 Z ™, 



nti 



X(z) = X (z 3 ) + z- 1 X 1 (z 3 ) + z- 2 X 2 (z 3 ). 

In the above, xq is the subsequence of x at multiples of 3 downsampled by 3, and 
similarly for x\ and X2 '■ 

x = [■■ ■ X-3 [i^] x 3 x 6 ...] , 

Xi = [■■■ X-2 [#7] X A X 7 . . .] , 

rp 

x 2 = [• ■ ■ £-1 \~X2~] x 5 x$ . . .1 . 



This is illustrated in Figure [8T37 a): to get xq we simply keep every third sample 
from x; to get X\, we shift x by one to the left (advance by one represented by 
z) and then keep every third sample; finally, to get X2, we shift x by two to 
the left and then keep every third sample. To get the original sequence back, 
we upsample each subsequence by 3, shift appropriately to the right (delays 
represented by z _1 and z~ 2 ), and sum up. 

Using ( J2.222) , we define the polyphase decomposition of the synthesis fil- 
ters: 



5i,0,n — 9i,3n 



ZT 



G il0 (z) = ^</i,3„^ n , (8.11a) 

nGZ 

9i,l,n = 9i,3n+l < ► Gi,l(z) = ^ ft,3n+l 2^™, (8.11b) 

riGZ 

Gifl(z) = J2^,3n+2Z~ n , (8.11c) 

riGZ 

Gi(z) = Gifliz^+z^G^iz^ + z^G^iz 3 ), 



9i,2,n — 9i,3n+2 



ZT 



where the first subscript denotes the filter, the second the polyphase component, 
and the last, the discrete time index. In the above, we split each synthesis filter 
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into its subsequences modulo 3 as we have done for the input sequence x: 



5-3 


5o 


53 


56 


5-2 


5i 


54 


57 


.9-1 


52 


55 


58 



5o = 
5i = 
52 = 
We can now define the polyphase matrix $ p (z): 

**(*) 

The matrix above is on the synthesis side; to get it on the analysis side, we 
define the polyphase decomposition of analysis filters using ( |2.222| ) and similarly 
to what we have done in the two-channel case: 



G ,o(z) Gi t o(z) G 2 ,o(z) 

Go,i(z) G ls {z) G 2A (z) 
G , 2 (z) G h2 (z) G 2j2 (z) 



9i,0, 



9i,3r, 



9i,-3n 



9i,l,n — 5i,3n-l — <7i,-3n+l 
9i,2,n = 9i,3n-2 = 9i,-3n+2 



ZT 



ZT 



ZT 



Gi,o{z) = ^ gj,-3n z n , 
Gi,l{z) = ^ 9i,-3n+l Z~ n , 



Gi, 2 {z) = ]T 



9i-3n+2 Z 



G{z) = G h0 (z- 3 ) + zG h2 (z~ 3 ) + z 2 G hl (z~ 3 ). 



5-3 


50 




53 56 


5-4 


5-i 


52 55 








5-5 


5-2 


5i 54 



The three polyphase components are: 
5o,« = 

5i = 

52 = 



5-3 5-6 
5-2 5-5 
5-1 5-4 



Note that Gi(z) = Gi(z x ), as in the two-channel case. With these definitions, 
the analysis polyphase matrix i: 



53 


50 




54 


5i 


55 




5 


2 



*p(«) 



Go.o^- 1 ) G lfi {z~ l ) G^oiz- 1 ) 
GoAz- 1 ) G^{z- 1 ) G 2A (z^) 
GoAz- 1 ) G^z' 1 ) G 2 , 2 (z~ l ) 



^(z- 1 ). 



Figure 18.3 shows the polyphase implementation of the system, with the recon- 
struction of the original sequence using the synthesis polyphase matrix on the 
right 115 | and the computation of projection sequences ai on the left; note that 



115 Remember that we typically put the lowpass filter in the lower branch, but in matrices it 
appears in the first row/column, leading to a slight inconsistency when the filter bank is depicted 
in the polyphase domain. 
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<5M 



*« 



«0 



ai 



a 2 



*,. 



H§> 



4- 



Figure 8.3: A 3-channel analysis/synthesis filter bank in polyphase domain. 



as usual, the analysis matrix (polyphase here) is taken as a transpose (to check 
it, we could mimic what we did in Section [7. 2. 4[ we skip it here). 

The upshot of all this algebra is that we now have a very compact input- 
output relationship between the input (decomposed into polyphase components) 
and the result coming out of the synthesis filter bank: 



X(z) = [1 z- 1 z- 2 ]<t> p (z 3 )^* p (z' 3 ) 



X (z 3 ) 
Xi(z 3 ) 
X2(z 3 ) 



Note that we use the Hermitian transpose here because we will often deal with 
complex-coefficient filter banks in this chapter. The conjugation is applied only to 
coefficients and not to z. The above example went through various polyphase 
concepts for an orthogonal 3-channel filter bank. We now summarize the same 
concepts for a general, biorthogonal iV-channel filter bank, and characterize classes 
of solutions using polyphase machinery. 

Using ( 12.222) , in an ./V-channel filter bank, the polyphase decomposition of 
the input sequence, synthesis and analysis filters, respectively, is given by: 



£j,n — •KNn+j 



ZT 



X j( Z ) = X! x Nn+j Z~", 
AT-1 

X(z) = 5>-^X,(z w ), 

3=0 



.12a) 
.12b) 



9i,j,n — Qi : Nn+j 



ZT 



Gi,j{z) = / J 9i,Nn+j Z U . 
nSZ 
N-l 

Gi(z) = 5>-^(^), 



.12c) 



.12d) 
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9i,Nr 



ZT 



G i,j(z) = Y^ gi 



JV-1 



G, 



leading to the corresponding polyphase matrices: 



Nn—j z 



%{z) 



%(*) 



Gofi{z) Gifl(z) 

GVO) G hl (z) 

Go,n-i(z) Gi.n-i(z) 

G ,o(z) Gi,o{z) 

G ,i(z) G 1A (z) 

.Go,n-i(z) Gi.n-i{z) 



E zJ GiAz N ) 

3=0 



Gn-i,o(z) 

G N -i,i{z) 

Gn-i.n-i(z) 

Gn-i.o{z) 
G N -h(z) 



.12e) 



.12f) 



.13a) 



.13b) 



Gjsr-i,N-i(z). 

This formulation allows us to characterize classes of solutions. We state these 
without proof as they follow easily from the equivalent two-channel filter bank 
results, and can be found in the literature. 



Theorem 8.1 (TV-channel filter banks in polyphase 


domain) Given is 


an TV-channel filter bank and the polyphase matrices <& P (z) 


**{*) 


Then 




(i) The filter bank implements a biorthogonal expansion : 


f and only if 




%(zW p (z) = I. 






(8.14a) 


(ii) The filter bank implements an orthonormal expansion 


if and 


only if 




^(z^iz- 1 ) = I, 






(8.14b) 


that is, &p(z) is paraunitary. 








(iii) The filter bank implements an FIR biorthogonal expansion 


if and 


only if 


$p(z) is unimodular (within scaling), that is, if 








det($ p (z)) = az~ k . 






(8.14c) 



Note that we use the Hermitian transpose in (8.14b[ ) because we will often deal 
with complex-coefficient filter banks in this chapter. The conjugation is applied only 
to coefficients and not to z. 



Design of TV-Channel Filter Banks In the next two sections, we will discuss two 
particular TV-channel filter bank design options, in particular, those that add local- 
ization features to the DFT. To design general TV-channel orthogonal filter banks, 
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we must design N X N paraunitary matrices. As in the two-channel case, where 
such matrices can be obtained by a lattice factorization (see Section [7. 3. 3[ ), N x N 
paraunitary matrices can be parameterized in terms of elementary matrices (2x2 
rotations and delays). Here, we just give an example of a design of a 3 x 3 parauni- 
tary matrix leading to a 3-channel orthogonal filter bank; pointers to literature are 
given in Further Reading. 

Example 8.2 (Orthogonal JV-channel filter banks) One way of param- 
eterizing paraunitary matrices is via the following factorization: 



%(z) = U 



K-\ 



ndiag([z- 1 ,l,l])[/ fc 



k=l 



.15a) 



where 



Un 



"1 0~ 




cos #01 


— sin #oi 




COS #02 


-sin #02 


cos #oo — sin #oo 




1 







sin #02 


COS #02 


sin #oo cos $oo 




sin #oi 


COS #01 







1 


cos 9 k0 - sin # fc0 




"1 


0" 






sin 9 k0 cos 9 k0 




cos #fei 


-sin# fc i 




(8.11 


1_ 




_0 sin # fc i 


COS #fei_ 









lh 



The degrees of freedom in design are given by the angles 9 k j. This freedom in 
design allows for constructions of orthogonal and linear-phase FIR solutions, not 
possible in the two-channel case. 

8.3 Complex Exponential-Modulated Local Fourier 
Bases 

At the start of the previous section, we considered two extreme cases of local Fourier 
representations implementable by A^-channel filter banks: those based on the DFT 
(box in time/sine in frequency, good time/poor frequency localization) and those 
based on ideal bandpass filters (sine in time/box in frequency, good frequency/poor 
time localization). These two particular representations have something in com- 
mon; as implemented via iV-channel orthogonal filter banks, they are both obtained 
through a complex-exponential modulation of a single prototype filter. 

Complex-Exponential Modulation Given a prototype filter p = go, the rest of the 
filters are obtained via complex-exponential modulation: 



9i,n = Vne^ IN)m = 

Gi(z) = P{W l N z), 
Gi(e j ") = p^-i^/^i)) 



PnW J 



N 



.16) 



P{W l N en, 



for i = 1, 2, . . . ,N— 1. This is clearly true for the DFT basis from ( 18.2) . (8.5) , as well 
as that constructed from the ideal filters (8.6) (see also Exercise 18.4) . A filter bank 
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implementing such an expansion is often called complex exponential-modulated filter 
bank. While the prototype filter p = go is typically real, the rest of the bandpass 
filters are complex. 

8.3.1 Balian-Low Theorem 

We are now back to the question whether we can find complex exponential-modulated 
local Fourier bases with a trade-off of time and frequency localization we have seen 
for the DFT and sine bases in Figure 18.21 To that end, we might want to worsen 
the time localization of the DFT a bit in the hope of improving the frequency 
one; unfortunately, the following result excludes the possibility of having complex 
exponential-modulated local Fourier bases with support longer than 7V" j 116 | 

Theorem 8.2 (Discrete Balian-Low theorem) There does not exist a com- 
plex exponential-modulated local Fourier basis implementable by an iV-channel 
FIR filter bank, except for a filter bank with filters of length N. 



Proof. To prove the theorem, we analyze the structure of the polyphase matrix of a 
complex exponential- modulated filter bank with filters as in (8.161) . Given the polyphase 
representation ( ]8.12dj) of the prototype filter p — go, 



P{z) = P ( Z N ) + Z - 1 P 1 ( Z N ) + . 
the modulated versions become 



+ z-^ N - 1) P N . 



lO )» 



Gi(z) 

for i = 



P(W' N z) = P (z N ) + WH i z- 1 p L (z 1T ) + ... + W; 



■(N-l)i -(N-l) ■ 



l(Z ) 



1,2, 



, N — 1. As an example, for JV = 3, the polyphase matrix is 



*p(*) 



*>(*) 



Po(z) 



Po(z) 



P 1 (z) W^Piiz) W 8 - 2 J\(z) 

P 2 (z) W 3 - 2 P 2 (z) W 3 - 1 P 2 (z\ 



Pl(*) 



P*{*) 



w 3 



W 3 

w, 



(8.17) 



that is, a product of a diagonal matrix of prototype filter polyphase components and 
the conjugated DFT matrix ( |2.161aj) . According to Theorem 18.11 this filter bank im- 
plements an FIR biorthogonal expansion if and only if c & P (z) is a monomial. So, 



det($ p (z)) 



N-l 

n^)det(F*) 

l/VN 



N-l 

n n PiW, 

j = 



(8.18) 



is a monomial if and only if each polyphase component is; in other words, each polyphase 
component of P(z) has exactly one nonzero term, or, P(z) has N nonzero coefficients 
(one from each polyphase component). 



116 



This result is known as the Balian-Low theorem in the continuous-domain setting, see Sec- 



tion 111.4.11 
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While the above theorem is a negative result in general, the proof shows the factor- 
ization ( 18.17) that can be used to derive a fast algorithm, shown in Section [8. 51 (the 
same factorization is used in Solved Exercise 18.2 to derive the relationship between 
the modulation and polyphase matrices). Rewriting ( 18.17J ) for general N, as well 
as the analysis polyphase version of it, 

%(z) = diag([P (z), Pi (*),..., Pjv-i(>)])^*, (8.19a) 

%{z) = diag([Go,oM,Go^),...,G jv-iM])P*,, (8.19b) 

this filter bank implements a basis expansion if and only if 

diag([P (z), . . . , Pv-i(z)]) diag([G 0l o(*), • • • , G ,n-i(z)])* = z~ k N I, 

a more constrained condition then the general one of §> p (z)4>*(z) = z~ k I (N 
appears here since we are using the unnormalized version of F) . We also see exactly 
what the problem is in trying to have an orthogonal complex exponential-modulated 
filter bank with filters of length longer than N: If the filter bank were orthogonal, 
then Gq(z) = Gq(z~ 1 ) = P(z~ 1 ), and the above would reduce to 

%{z)®* p {z- x ) = diag([P (2), Pi (*),..., Pv-i(z)])P* 

F diag([P (z- 1 ), Piiz- 1 ), . . . , Pn-^z- 1 )])* 
= Ndmg([P (z)P (z- 1 ),P 1 (z)P 1 (z- 1 ),...,P N ^ 1 (z)P N ^ 1 (z- 1 )}) 
= NI, 

possible with FIR filters if and only if each polyphase component Pj (z) of the proto- 
type filter P(z) were exactly of length 1 (we assumed the prototype to be real). Fig- 
ure \EA\ depicts a complex exponential- modulated filter bank with TV = 3 channels. 
Solved Exercise 18.2 explores relationships between various matrix representations 
of a 3-channel complex exponential-modulated filter bank. 

8.3.2 Application to Power Spectral Density Estimation 

We now discuss the computation of perio dogr ams, a widely used application of 
complex exponential-modulated filter banks j 117 ! 

Given a discrete stochastic process, there exist various ways to estimate its 
autocorrelation. In Chapter \2\ we have seen that, for a discrete WSS process, there 
is a direct link between the autocorrelation of a discrete stochastic process and the 
power spectral density, given by ( 12.232) . However, when the process is changing 
over time, and we are interested in local behavior, we need a local estimate of 
the autocorrelation, and therefore, a local power spectral density. We thus need a 
local Fourier transform, by windowing the sequence appropriately, and squaring the 
Fourier coefficients to obtain a local power spectral density. 



117 The terms periodogram and spectrogram should not be confused with each other: the former 
computes the estimate of the power spectral density of a sequence, while the latter shows the 
dependence of the power spectral density on time. 
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Figure 8.4: A 3-channel analysis/synthesis complex exponential-modulated filter bank, 
with analysis filters Gi(z) — Go(W^z) and synthesis filters Gi(z) — Go(W£z) — P(W£z). 
F and F* are unnormalized DFT matrices (thus 3x at the output) and are implemented 
using FFTs. 



Block-Based Power Spectral Density Estimation A straightforward way to esti- 
mate local power spectral density is simply to cut the sequence into adjacent, but 
nonoverlapping, blocks of size M, 



X-l Xq X\ 



X n M ■ ■ ■ XnM+M-l £(n+l)Af ■ ■ • x (n+l)M+M-l 



(8.20) 

with b n the nth block of length M, and then take a length- M DFT of each block, 



B„ 



Fb, 



.21) 



with F from ( j2.161aj ) . Squaring the magnitudes of the elements of B n leads to an 
approximation of a local power spectral density, known as a periodogram. 

While this method is simple and computationally attractive (using an order 
O(logM) operations per input sample), it has a major drawback we show through 
a simple example, when the sequence is white noise, or x n is i.i.d. with variance 
a\. Since F is a unitary transform (within scaling), or, a rotation in M dimensions, 
the entries of B n are i.i.d. with variance cr^, independently of M (see Exercise 18.6) . 
The power spectral density is a constant, but while the resolution increases with 
M, the variance does not diminish. 



Example 8.3 (Block-based power spectral density estimation) 
a source generated by filtering white Gaussian noise with a causal filter 



H{z) 



1 



az 



(1 -2(3 cos ujq z" 1 +f3 2 z~ 1 )' 



ROC 



{z\\z\>±}, 



Consider 



.22) 



where a, (3 are real and 1 < j3 < oo. This filter has poles at (l//9)e J ' < "° and 
zeroes at (1/a) and oo. The power spectral density of y = h * x is 

\l-ae~ iu \ 2 

Ay{ - e ° U) = |l-2/?cos Wo e-^ + /? 2 e-J 2 "| 2 ' (8 ' 23) 
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Figure 8.5: Power spectral density from ( 18,23) ). (a) Theoretical, as well as local estimates 
computed using ( 18.211) on the blocked version of the source y n , with blocks of length (b) 
M = 64, (c) M = 256 and (d) M = 1024, respectively. 
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Figure 8.6: Averaged power spectral density from ( ]8.25j) . The theoretical power spectral 
density is the same as in Figure [8T5T a), (a) Average of 16 blocks of length 64. (b) Average 
of 4 blocks of length 256. 



plotted in Figure [831(a) for a = 1.1, /3 = 1.1 and u> = 2tt/3. Figures [831 (b), (c) 
and (d) show the power spectral density calculated using ( 18.21J ) on the blocked 
version of y n , with blocks of length M = 64, 256 and 1024. While the shape of 
the power spectral density can be guessed, the variance does indeed not diminish. 



Averaged Block-Based Power Spectral Density Estimation When the sequence 
is stationary, the obvious fix is to average several power spectra. Calling A n ^ the 
block-based power spectral density, or, from ( 18. 21} ) 



\B„ 



\(Fb„ 



.24) 



we can define an averaged local power spectrum by summing K successive ones, 



A 



(K) 
k 



1 A ' _1 

K ^ An " 

n=0 



(8.25) 



known as an averaged periodogram. Exercise 18.61 shows that the variance of A n J 
is about l/K the variance of A n ^. Given a length-L [L = KM) realization of a 
stationary process, we can now vary M or K to achieve a trade-off between spectral 
resolution (large M) and small variance (large K). 
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Figure 8.7: Rectangular, triangular and Hamming windows of length M — 31, centered 
at the origin, (a) Time-domain sequences, (b) DTFT magnitude responses (in dB). 
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Figure 8.8: Windowing with 50% overlap using a triangular window. 



Example 8.4 (Averaged block-based power spectral density estimation) 
Continuing Example 1 8. 3 1 we now have a realization of length L = 1024, but would 
like to reduce the variance of the estimate by averaging. While any factorization 
of 1024 into K blocks of length 102A/K, for K = 2% % = 0, 1, . . . , 10, is pos- 
sible, too many blocks lead to a poor frequency resolution (K = 1 was shown 
in Figure [8751 (d)). We consider two intermediate cases, 16 blocks of length 64 
and 4 blocks of length 256, shown in Figure 18.6( a) and (b). These should be 
compared to Figure 18.5 (b) and (c), where the same block size was used, but 
without averaging. 



Estimation Using Windowing and Overlapping Blocks In practice, both the pe- 
riodogram and its averaged version are computed using a window, and possibly 
overlapping blocks. 

We first discuss windowing. In the simplest case, the block b„ in ( 18.20) corre- 
sponds to applying a rectangular window from (2.13a) otherwise, shifted to location 
nM (most often the nonunit-norm version, so that the height of the window is 1). 
To smooth the boundary effects, smoother windows are used, of which many designs 
are possible. All windows provide a trade-off between the width of the main lobe 
(the breadth of the DTFT around zero, typically of the order of 1/M), and the 
height of the side lobes (the other maxima of the DTFT). Exercise 18.71 considers 
a few such windows and their respective characteristics. The upshot is that the 
rectangular window has the narrowest main but the highest side lobe, while others, 
such as the triangular window, have lower side but broader main lobes. Figure 18.71 
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Figure 8.9: Filter-bank implementation of the periodogram and averaged periodogram 
using a complex exponential- modulated filter bank with filters ( 18.161) . The sampling factor 
N indicates the overlapping between blocks (N = M, basis; N < M, frame); |.| 2 computes 
the squared magnitude; S(z) computes the if-point averaging filter; and finally, the output 
is possibly downsampled by S. (TfBD: s should be S, K should be N, N should be M.) 



shows three commonly used windows and their DTFT magnitude responses in dB. 
Instead of computing the DFT on adjacent, but nonoverlapping blocks of size 
M as in ( 18.201 ), we can allow for overlap between adjacent blocks, for example 
(assume for simplicity M is even), 

b n = [XnM/2 x nM/2+l x nM/2+2 ■ ■ ■ x nM/2+M-l\ , (8.26) 

a 50% overlap, shown in Figure 18.81 using a triangular window for illustration. In 
general, the windows move by N that is smaller or equal to M . 

Filter-Bank Implementation The estimation process we just discussed has a nat- 
ural filter bank implementation. Consider a prototype filter go = p (we assume a 
symmetric window so time reversal is not an issue), and construct an M-channel 
complex exponential-modulated filter bank as in Figure 18.1 and ( 18.16) . The proto- 
type filter computes the windowing, and the modulation computes the DFT. With 
the sampling factor N = M, we get a critically sampled, complex exponential- 
modulated filter bank. The sampling factor N can be smaller than M, in which 
case the resulting filter bank implements a frame, discussed in Chapter \W\ (see 
Figure 110.141 ) . Squaring the output computes a local approximation to the power 
spectral density. Averaging the output over K outputs computes the averaged pe- 
riodogram, accomplished with a if-point averaging filter on each of the M filter 
bank outputs, or S(z) = (l/K)J2 m =o z ~ m - Finally, the output of the averaging 
niters maybe downsampled by a factor S < K . We have thus constructed a versa- 
tile device to compute local power spectral density, summarized in Figure 18.91 and 
Table [8H 

The discussion so far focused on nonparametric spectral estimation, that is, 
we assumed no special structure for the underlying power spectral density. When 
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Parameter Filter-Bank Operation Computes 



M 



Number of channels 



Go (2) = P(z) Prototype filter 

N Downsampling factor 



S(z) 



Channel filter 



Number of frequency bins 

(frequency resolution of the analysis) 

Windowing 

Overlap between adjacent blocks 

(N = M, basis; N < M, frame) 

Averaging and variance reduction 
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Table 8.1: Complex exponential-modulated filter-bank implementation of block-based 
power spectral density estimation. 
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Figure 8.10: A transmultiplexer modulates N sequences into a single sequence x of 
Af-times higher bandwidth, and analyzes it into A^ channels. 



a deterministic sequence, like a sinusoid, is buried in noise, we have a parametric 
spectral estimation problem, since we have prior knowledge on the underlying de- 
terministic sequence. While the periodogram can be used here as well (with the 
effect of windowing now spreading the sinusoid), there exist powerful parametric 
estimation methods specifically tailored to this problem (see Exercise 18 .8 j ). 

8.3.3 Application to Communications 

Transmultiplexer s 118 ! are used extensively in communication systems. They are 
at the heart of orthogonal frequency division multiplexing (OFDM), a modulation 
scheme popular both in mobile communications as well as in local wireless broad- 
band systems such as IEEE 802.11 (Wi-Fi). 

As we have seen in Chapter \7\ and Solved Exercise 17.71 a transmultiplexer 
exchanges the order of analysis/synthesis banks. If the filters used are the complex 



118 Such devices were used to modulate a large number of phone conversations onto large band- 
width transatlantic cables. 
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exponential-modulated ones as we have seen above, transmultiplexers become com- 
putationally efficient. For example, consider N sequences a,, i = 0, 1, ...,N — 
1, entering an iV-channel complex exponential-modulated synthesis filter bank, 
to produce a synthesized sequence x. Analyzing x with an iV-channel complex 
exponential-modulated analysis filter bank should yield again N sequences aj, 
i = 0, 1, . . . , N - 1 (Figure [830]) . 

Similarly to what we have seen earlier, either length- N filters (from the DFT 
( 18.2) ) or sine filters, ( 18. 6) , lead to a basis. Using a good lowpass leads to approximate 
reconstruction (see Exercise 18.91 for an exploration of the end-to-end behavior). 

In typical communication scenarios, a desired signal is sent over a channel with 
impulse response c(t) (or equivalently c„ in a bandlimited and sampled case), and 
thus, the received signal is the input convolved with the channel. Often, the effect 
of the channel needs to be canceled, a procedure called channel equalization. If the 
channel is LSI, this equalization can be performed in Fourier domain, assuming c„ is 
known (either a priori or measured). This is a first motivation for using a Fourier- 
like decomposition. Moreover, as the complex sinusoids are eigensignals of LSI 
systems, such complex sinusoids (or approximations thereof) are good candidates 
for signaling over a known LSI channel. Namely, an input sinusoid of frequency lvq 
and a known amplitude A (or a set of possible amplitudes {A^}), will come out of the 
channel as a sinusoid, scaled by the channel frequency response at u)q, and perturbed 
by additive channel noise present at that frequency. Digital communication amounts 
to being able to distinguish a certain number of signaling waveforms per unit of time, 
given a constraint on the input (such as maximum power). 

It turns out that an optimal way to communicate over an LSI channel with 
additive Gaussian noise is precisely to use Fourier-like waveforms. While an ideal 
system would require a very large number of perfect bandpass channels, practical 
systems use a few hundred channels (for example, 256 or 512). Moreover, instead 
of perfect bandpass filters (which require sine filters), approximate bandpass filters 
based on finite windows are used. This time localization also allows to adapt to a 
changing channel, for example, in mobile communications. 

The system is summarized in Figure [8. Ill for M channels upsampled by iV" j 119 | 
Such a device, allowing to put M sequences {xi\i=o t ...,M-u onto a single channel 
of TV-times larger bandwidth, has been historically known as a transmultiplexer. 

When the prototype filter is a rectangular window of length M, in the absence 
of channel effects, the synthesis/analysis complex exponential-modulated filter bank 
is perfect reconstruction. When a channel is present, and the prototype filter is a 
perfect lowpass filter of bandwidth [—n/M, tt/M), each bandpass channel is affected 
by the channel independently of the others, and can be individually equalized. 

For filters with finite impulse response, one can either use a narrower-band 
prototype (so neighboring channels do not interact), or, use fewer than the critical 
number of channels, which in both cases means some redundancy is left in the 
synthesized sequence that enters the channel. 



119 Again, if M > TV, such a filter bank implements a frame, discussed in Chapter 101 
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Figure 8.11: Communication over a channel using a complex exponential-modulated 
transmultiplexer. When M — N, it is critically sampled, while for M > N, the sequence 
entering the channel is redundant. 

8.4 Cosine-Modulated Local Fourier Bases 

A possible escape from the restriction imposed by the Balian-Low theorem is to 
replace complex-exponential modulation (multiplication by Wjy = e~ J ' ) with 
an appropriate cosine modulation. This has an added advantage that all filters are 
real if the prototype is real. 

Cosine Modulation Given a prototype filter p, all of the filters are obtained via 
cosine modulation: 



9i,n 

Gi(z) 
G 4 (e*") 



2tt.. 1. 

i'n C<»S [ 7^0+ 2> U + ' 



.27) 



\ [e^P(W^ 1/2) z) + e-^P(W^ i+1/2) 

- \e 39i p(e 3(u ~ (27Tl2N)[i+1/2)) ) + e -J e ' j p( e J( w +( 2,r / 2Ar )(' i + 1 /2))l 



for « = 0,1,..., AT— 1, and 6i is a phase factor that gives us flexibility in designing the 
representation; you may assume it to be for now. Compare the above with ( 18.16) 
for the complex-exponential modulation; the difference is that given a real prototype 
filter, all the other filters are real. Moreover, the effective bandwidth, while 2tt/N 
in the case of complex-exponential modulation, is tt/N here. The difference occurs 
because, the cosine-modulated filters being real, have two side lobes, which reduces 
the bandwidth per side lobe by two. The modulation frequencies follow from an 
even coverage of the interval [0, 7r] with side lobes of width ir/N. This is illustrated 
in Figure 18.12 for N = 6 for both complex as well as cosine modulation. 

Will such a modulation lead to an orthonormal basis? One possibility is to 
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Figure 8.12: Complex exponential-modulated versus cosine-modulated filter bank with 
N — 6. (a) In the complex case, the bandwidth of the prototype is 27r/6, and the center 
frequencies are 2ni/6, i = 0,1, ... ,5. (b) In the cosine case, the bandwidth of the prototype 
is 27r/12, and the center frequencies are (2i + l)-7r/12, i = 0, 1, . . . , 5. Unlike in (a), the 
first filter does not correspond to the prototype, but is modulated to 7r/12. 



choose an ideal lowpass filter as prototype, with support [—tt/2N, ir/2N). However, 
as we know, this leads to a sine-like basis with infinite and slowly-decaying impulse 
responses. Another solution is a block transform, such as the discrete cosine trans- 
form (DCT) discussed in Solved Exercise 18.31 too short for an interesting analysis. 
Fortunately, other solutions exist, with FIR filters of length longer than N, which 
we introduce next. 



8.4.1 Lapped Orthogonal Transforms 

The earliest example of such cosine-modulated bases was developed for niters of 
length 2N, implying that the nonzero support of the basis sequences overlaps by 
N/2 on each side of a block of length N (see Figure 18.13]) , earning them the name 
lapped orthogonal transforms (LOT). 

LOTs with a Rectangular Prototype Window 

Consider N filters go,gi, ■■ ■ ,9n-i of length 2N given by ( ]8.27j ). We start with a 
rectangular prototype window filter: 

Pn = ^=, n = 0,l,...,2N-l, (8.28) 

v jV 



where, by choosing p n = 1/v-ZV, we ensured that \\gi\\ = 1, for all i. 
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Figure 8.13: LOT for N - 8. The filters are of length 2N = 16, and thus, they overlap 
with their nearest neighbors by N/2 — 4 (only go is shown). Tails are orthogonal since the 
left half (red) is symmetric and the right half (orange) is antisymmetric. 



As we always do, we first find conditions so that the set {gi, n —Nk\i k G '» 
i € {0, 1, . . . ,N — 1}, is an orthonormal set. To do that, we prove ( |8.7j ) and ( 18. 91 ) 



Orthogonality of a Single Filter To prove that a single filter gi is orthogonal to its 
shifts by N, it is enough to prove this for just two neighboring shifts (as the length 
of the filters is 2N, see Figure [8. 13ft . An easy way to force this orthogonality would 
be if the left half (tail) of the filter support (from 0, 1, . . . , N — 1) were symmetric 
around its midpoint (N — l)/2, while the right half (tail) of the filter support (from 
N, N + 1, . . . , 2N — 1) were antisymmetric around its midpoint (3./V — l)/2. Then, 
the inner product {gi, n , <?i,n-w) would amount to the inner product of the right tail 
of 5i, n with the left tail of gi. n ~N, and would automatically be zero as a product 
of a symmetric sequence with an antisymmetric sequence. The question is whether 
we can force such conditions on all the filters. Fortunately, we have a degree of 
freedom per filter given by 9i, which we choose to be 



2tt , 1N- 

(i+ -) 

2N y 2' 2 



(8.29) 



After substituting it into ( 18.27J ), we get 



N 



cos 



2tt .. 1, . N- 

(i+ -)(n 

2N K 2 M 2 



(8.30) 



We now check the symmetries of the tails; for the left tail, 



1 / 2tt , 1, . N - 1 

Qi N-n-\ = — 1= cos (i H — ) t—n H 

y • y/N \2N y 2 ,y 2 



(a) 



Qi,n-t 



.31a) 
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for n = 0, 1, . . . , N/2 — 1, that is, it is indeed symmetric. In the above, (a) follows 
from the symmetry of the cosine function. Similarly, for the right tail, 

1 /2tt , l w 3JV-1 

9i,2N-n-l = —7= cos 77777 (*+ 77) (~ n 



N \2N v 2" 2 

(a) 1 f 2tt , l w 37V- 1 

= , cos \%-\ — (n 

JN \2N y 2 M 2 



1 /2tt , l w N + 1 



1 ( 2tt .. 1. . iV+1, 

= 7^ cos l^ (i + 2 )(n + ^ ) + 7r 

= -9i,N+n, (8.31b) 

for n = 0,1,... , N/2 — 1, that is, it is indeed antisymmetric. In the above, (a) 
follows from the symmetry of the cosine function and (b) from cos(0 + 7r) = cos(#). 
An LOT example with N = 8 is given in Figure 18.141 

Orthogonality of Filters We now turn our attention to showing that all the filters 
are orthogonal to each other (and their shifts). As we have done in ( ]8.27j h we use 
(23755, to express g { from (ESQ) 

_ 1 /V(*+l/2)(n-(Ar-l)/2) „ r -(i+l/2)(n-(JV-l)/2)\ ,„„„,. 

The inner product between two different filters is then: 

2JV-1 



(w. »> = 4^ £ (^ +1 



/ 2 )(„_(JV-l)/2) ( i+ l/2)(„-(JV-l)/2) 



VF 



2 A 



w (fc+l/2)(n-(Ar-l)/2) + ^_(AH-l/2)(n-(JV-l)/2)\ 



2AT "2Ar 

2iV-l 



_ ! Y" / w -(i+fe+l)(n-(JV-l)/2) H/ (i-fe)(n-(iV-l)/2) 

n=0 
M/ -(i-fc)(n-(JV-l)/2) (i + fe + l)(„-(Ar_i)/2) 

To show that the above inner product is zero, we show that each of the four sums 
are zero. We show it for the first sum; the other three follow the same way. 

2JV-1 ., 2JV-1 



1 V^ W (*+k+l)(n-(N-l)/2) _ 1 T „-(iV-l)/2 V^ (w( i + k + 1 hn _ n (o 00} 
n=0 n=0 

because of the orthogonality of the roots of unity ( |2.277c| ) . 
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Figure 8.14: LOT for N — 8 with a rectangular prototype window. The eight basis 
sequences (note the symmetric and antisymmetric tails) and their magnitude responses, 
showing the uniform split of the spectrum. 



Matrix View As usual, we find the matrix view of an expansion to be illuminating. 
We can write the output of an LOT synthesis bank similarly to |7.7j) . 



<I> 



Go 

G\ Go 

G\ Go 
G l 



.34) 
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where the columns of Go and G\ are the left and right tails respectively, 



Go 
Gx 



3o,o 
30,1 



3i,o 
3i,i 



3o,jv-i 9i,n-i 
90,N gi,N 

90,N+1 9l,N+l 



3w-i,o 

9N-1.1 
3N-1,JV-1 

gN-i,N 

3jv-i,jv+i 



.35) 



_30,2AT-1 3l,2Af-l ■ ■ • 3W-1,2W-1. 

Since the expansion is orthonormal, $$ T = /, but also $ T $ = /, or, 

GtqCtq + GiG^ = I , 

GtiGtq — GqG-^ = (J, 

Gq Go + G^ Gi = I , 
Gq Gi = G-l Gq = U. 



(8.36a) 
(8.36b) 
(8.36c) 
(8.36d) 



Following the symmetry/antisymmetry of the tails, the matrices Go and Gi have 
repeated rows. For example, for N = 4, 



Go 



3o,o 3i,o 32,o 33,0 

3o,i 3i,i 32,i 33,i 

3o,i 3i,i 32,i 33,i 

.3o,o 3i,o 32,o 33,o. 



and Gi 



30,4 


31,4 


32,4 


33,4 


30,5 


31,5 


32,5 


33,5 


30,5 


-31,5 


-32,5 


-33,5 


30,4 


-31,4 


-32,4 


-33,4 



Denoting by Go and Gi the upper halves of Go and Gi, respectively, we can express 
Go and Gi as 



Go 



'AT/2 



J, 



N/2 



Go 



and 



Gi 



N/2 
N/2, 



Gi, 



(8.37) 



where In/2 is an N/2 x N/2 identity matrix and Jn/2 is an N/2 x N /2 antidiagonal 
matrix (defined in Section [1.B.2J ). Note that J N = In, and that premultiplying by 
Jn reverses the row order (postmultiplying reverses the column order). 

From the above, both Go and Gi have rank N/2. We can easily check that 
the rows of Go and Gi form an orthogonal set, with norm l/v2- Using all of the 
above, we finally unearth the special structure of the LOTs: 



GqGq — 


In/2 
Jn/2_ 


GqG [In/2 Jn/2\ — 


In/2 
Jn/2_ 


2 In/2 


[In/2 


Jn/ 


2] 




L 

2 


In/2 Jn/2 
Jn/2 In/2_ 


= -{In + Jn), 








(8.38a) 


GiG 1 = - 


1 
2 


In/2 —Jn/2 
— Jn/2 In/2_ 


= -^{In - Jn), 








(8.38b) 


G^d = ( 


-,1 

J l 


Go 


= 0. 














(8.38c) 
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(c) 

Figure 8.15: LOT for N — 8 with a smooth, power-complementary prototype window 
from Table 8.21 Its (a) impulse response and (b) magnitude response, (c) The eight 
windowed basis sequences and their magnitude responses. Note the improved frequency 
resolution compared to Figure [8.14( b). 



LOTs with a IMonrectangular Prototype Window 

At this point, we have TV filters of length 2JV, but their impulse responses are simply 
rectangularly-windowed cosine sequences. Such a rectangular prototype window is 
discontinuous at the boundary, and thus not desirable; instead we aim for smooth 
tapering at the boundary. We now investigate whether we can window our previous 
solution with a smooth prototype window and still retain orthogonality. 
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pa 


Pi 


P'l 


P3 Pi P5 P6 PI 


0.0887655 


2366415 


0.4238081 


0.6181291 0.7860766 0.9057520 0.9715970 0.9960525 



Table 8.2: Power-complementary prototype window used in Figure [8,151 The prototype 
window is symmetric, so only half of the coefficients are shown. 



For this, we choose a power-complementary j 120 ! real and symmetric prototype 
window sequence p, such that: 



P2N-U-1 = Pn, 
\Pn\ 2 + \PN-n-l\ 2 = 2, 



for n = 0, 1, 



,7V- 1. Let 



(8.39a) 
(8.39b) 



P = diag([po,2>i,...,2Jjv-i]), 

Pi = di&g(\p N ,p N+ i,...,p 2 N-i))- 



Then, ( 18.391 ) can be rewritten as 



Pi = JnPoJn, 
P'^ + P'l = 21. 

The counterpart to ( 18.35) are now the windowed impulse responses 



Go 
G'i 



-PoGo 
PiGi 



Pn 



JnPqJ, 



N 



Go 



Note that 



P J N Po = PiJnPi- 



(8.40a) 
(8.40b) 



.41) 



.42) 



These windowed impulse responses have to satisfy ( J8.36) , or, ( J8.38I ) (substituting 
G[ for Gi). For example, we check the orthogonality of the tails Q8.38c| ): 



i/T/-i' ^ 



w 



G'i G' = G x ( JnPqJn) PqGq = G 1 [In/2 —Jn/2\ JnPqJnPq 



J 



N/2 



N/2, 



Go, 



where (a) follows from ( 18.41) , and (b) from ( 18.37D . As the product JnPoJnPq is 
diagonal and symmetric (the fcth entry is pkPN-k), we get 



[In/2 -Jn/2\ JnPqJnPq 



l N/2 



J 



N/2 



0. 



120 The term power complementary is typically used to denote a filter whose magnitude response 
squared added to the frequency-reversed version of the magnitude response squared, sums to a 
constant, as in ( |2.208[ ). We use the term more broadly here to denote a sequence whose magnitude 
squared added to the time-reversed version of the magnitude squared, sums to a constant. 
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To complete the orthogonality proof, we need to verify ( |8.36aD (with appropriate 
substitutions as above), 

Go{Gq) +G 1 G 1 = PqGqG Pq + P\GxG x P\ 

= \p {In + Jn)Po + \px{In-Jn)Pi 

= 1(Po+P?) + 1(PoJnPo-PiJnPi) = I, 



where (a) follows from (EEL); (b) from ( [OS] ) and (c) from ( I8.40b[ ) and ( lcU2j ). An 
example of a windowed LOT is shown in Figure 18.151 for N = 8. The prototype 
window is symmetric of length 16, with coefficients as in Table [8T2l 

Shift-Varying LOT Filter Banks We end this section with a discussion of a varia- 
tion on the theme of prototype windows, both for its importance in practic e 121 ! and 
because it shows the same basic principles at work. Assume one wants to process 
a sequence with an iV-channel filter bank and then switch to a 2Af-channel filter 
bank. In addition, one would like a smooth rather than an abrupt transition. In- 
terestingly, to achieve this, it is enough for the two adjacent prototype windows 
to have overlapping tails that are power complementary (see Figure [8.161 ). Calling 
p^ L > and p^ R > the two prototype windows involved, then 

\p { n L) \ 2 + \ P L R) \ 2 = 2 

leads again to orthogonality of the overlapping tails of the two filter banks. 

8.4.2 Application to Audio Compression 



In Section 18.3.21 we have made numerous references to redundancy, which we will 
discuss in Chapter \W\ In compression, the opposite is required: we want to remove 
the redundancy from the sequence as much as possible, and thus, typically, bases are 
used (in particular, orthonormal bases). While we will discuss compression in detail 
in Chapter [131 here, we just discuss its main theme: a small number of transform 
coefficients should capture a large part of the energy of the original sequence. In 
audio compression, the following characteristics are important: 

(i) The spectrum is often harmonic, with a few dominant spectral components, 
(ii) The human auditory system exhibits a masking effect such that a large sinusoid 
masks neighboring smaller sinusoids. 

(iii) Sharp transitions, or attacks, are a key feature of many instruments. 



It is clear that (i) and (iii) are in contradiction. The former requires long prototype 
windows, with local frequency analysis, while the latter requires short prototype 
windows, with global frequency analysis. The solution is to adapt the prototype 
window size, depending on the sequence content. 



121 Audio coding schemes use this feature extensively, as it allows for switching the number of 
channels in a filter bank, and consequently, the time and frequency resolutions of the analysis. 
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Figure 8.16: An example of the flexibility allowed by LOTs illustrated through different 
transitions from an 2-channel LOT to an 8-channel LOT. (a) Direct transition (both 
prototype windows have the same tails, a restriction on the 8-channel LOT as its prototype 
window must then be flat in the middle), (b) Transition using an asymmetric 4-channel 
LOT prototype window (allows for a greater flexibility in the 8-channel LOT prototype 
window), (c) Transition using an asymmetric 4-channel LOT prototype window and a 
symmetric 4-channel LOT prototype window, (d) Transition using several symmetric 4- 
channel LOT prototype windows (all the prototype windows are now symmetric and have 
the same tails). 



frequency 




(b) 



Figure 8.17: Analysis of an audio segment using a cosine-modulated filter bank, (a) 
Time-domain sequence, (b) Tiling of the time- frequency plane where shading indicates 
the square of the coefficient corresponding to basis sequence situated at that specific time- 
frequency location. 



Both for harmonic analysis and for windowing, including changing the size 
of the filter bank (we have just seen this), we use cosine- modulated filter banks 
similar to those from Section 18.41 creating an adaptive tiling of the time-frequency 
plane, with local frequency resolution in stationary, harmonic segments, and local 
time resolution in transition, or, attack phases. The best tiling is chosen based 
on optimization procedures that try to minimize the approximation error when 
keeping only a small number of transform coefficients (we discuss such methods in 
Chapter [13) . Figure [8.171 gives an example of adaptive time-frequency analysis. 

For actual compression, in addition to an adaptive representation, a number 
of other tricks come into play, related to perceptual coding (for example, masking), 
quantization and entropy coding, all specifically tuned to audio compression!! 22 ) 



122 All of this is typically done off line, that is, on the recorded audio, rather than in real time. 
This allows for complex optimizations, potentially using trial and error, until a satisfactory solution 
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Figure 8.18: Impulse response of the prototype window sequence modulating the cosine- 
modulated filter bank used in MP3. 



Example 8.5 (Filter banks used in audio compression) The MPEG au- 
dio standar d 123 ! , often called MP3 in consumer products, uses a 32-channel filter 
bank. It is not a perfect reconstruction filter bank; rather, it uses a symmetric 
prototype window p n of length L = 2N — 1 (in this case L = 511) with a symme- 
try around n = 255. The ith filter is obtained by modulation of the prototype 
window as 

( 2-k . 1,, N.\ 

9i,n = Pn cos \—{i+ -){n+ —)j , (8.43) 



for i = 0,1, . . . , N— 1. Comparing this to (8.30) , we see that, except for the phase 
factor, the cosine modulation is the same. Of course, the prototype window is 
also different (it is of odd length hinting at the phase difference). The impulse 
response of the prototype window used in MP3 is displayed in Figure [8. 181 Such a 
filter bank is called pseudo-QMF, because nearest neighbor aliasing is canceled as 
in a classical two-channel filter bank | 124 | While aliasing from other, further bands, 
is not automatically canceled, the prototype is a very good lowpass suppressing it 
almost perfectly. The input-output relationship is not perfect (unlike for LOTs), 
but again, with a good prototype window, it is almost perfect. 

8.5 Computational Aspects 

The expressions for the synthesis and analysis complex exponential-modulated filter 
banks in ( 18. 19a) and (8.19b) (see an illustration with three channels in Figure 18.41 ) 
lead to the corresponding fast algorithms given in Tables 18.31 and 18.41 

Complex Exponential-Modulated Filter Banks We now look into the cost of im- 
plementing the analysis filter bank; the cost of implementing the synthesis one is 
the same, as the two are dual to each other. Consider a prototype filter p of length 
L = NM; each polyphase component is then of length N . 



is obtained. 

123 While MPEG is a video standardization body, MP3 is its subgroup dealing with audio. Several 
different versions of audio compression, of different complexity and quality, have been developed, 
and the best of these, called layer III, gave the acronym MP3. 

124 The QMF filters are discussed in Further Reading of Chapter [7] as well as Exercise 17.191 
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ComplexModSynthesis(p, {cvo, ■ ■ • , Qjv-i}) 

Input: The prototype filter p and N channel sequences {«o, ■ ■ • , ctjv-i}- 

Output: Original sequence x. 

Decompose prototype p into its N polyphase components Pj, n = p Nn+ j 

for all n do 

Fourier transform: Transform channel sequences with the scaled inverse DFT 



*i,n 



1 
1 
1 

1 w 



1 



1 



"a 



11 A 



-(JV-l) 



N 



W, 



-2(N-1) 






W, 



1 






-(JV-1) 




"0,n 


2(iV-l) 




Oil.n 


(JV-1) 2 




_£W-l,n. 









Convolution: Convolve each sequence a'- n with the jth polyphase component of p 



-*o,n 

*'l'„ 



*JV-l,n. 



P0,r, 



Pi, 



PAf-l,n. 



*0,n 
4,n 



Inverse polyphase transform: Upsample/interleave channel sequences to get xp 
end for 
return x 

Table 8.3: Fast implementation of a complex exponential-modulated synthesis filter bank. 

First, we need to compute M convolutions, but on polyphase components of 
the input sequence, that is, at a rate M times slower. This is equivalent to a single 
convolution at full rate, or, of order O(N) operations per input sample. We then 
use an FFT, again at the slower rate. From ( |2.261[ ), an FFT requires of the order 
0(log 2 M) operations per input sample. In total, we have 



C ~ a log 2 M + N 



0(log 2 M), 



.44) 



operations per input sample. This is very efficient, since simply taking a length-M 
FFT for each consecutive block of M samples would already require log 2 M opera- 
tions per input sample. Thus, the price of windowing given by the prototype filter 
is of the order O(N) operations per input sample, or, the length of the prototype 
window normalized per input sample. A value for N depends on the desired fre- 
quency selectivity; a typical value can be of order 0(log 2 M). Exercise 18.41 looks 
into the cost of a filter bank similar to those used in audio compression standards, 
such as MPEG from Example 18.51 

What is the numerical conditioning of this algorithm? Clearly, both the poly- 
phase transform and the FFT are unitary maps, so the key resides in the diagonal 
matrix of polyphase components. While there are cases when it is unitary (such as 
in the block-transform case where it is the identity), it is highly dependent on the 
prototype window. See Exercise 18.51 for an exploration of this issue. 
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ComplexModAnalysis(p. x) 

Input: The prototype filter p and input x. 

Output: N channel sequences {ag, . . . , a N _i}. 



Decompose prototype p into its N polyphase components Pj, n = p Nn _j 

for all n do 

Polyphase transform: Compute input sequence polyphase components Xj iV , = Xpjn+j 
Convolution: Convolve polyphase components of prototype and input 



x 0,n 



Pa. 



Vi. 





XQ,n 




3*1, n 


* 






_%N-l,n_ 



PN-l,n. 

Fourier transform: Compute channel sequences by applying the forward DFT 







«0,n 




ai,n 






— 


JXN-l,n. 





1 


1 


1 


1 


w N 


w 2 N 


1 


w 2 N 


Wfi 



1 w 



(JV-1) 



ir 



2(Af-l) 

N 



w 2iN - r) 



w 



(Jv-i)' 

N 



*-Q,n 



end for 

return {c»o, . . . , ajv-i} 

Table 8.4: Fast implementation of a complex exponential-modulated analysis filter bank. 



Chapter at a Glance 

Our goal in this chapter was twofold: (1) to extend the discussion from Chapter [7] to more 
than two channels and associated bases; and (2) to consider those filter banks implementing 
local Fourier bases. 

The extension to iV channels, while not difficult, is a bit more involved as we now 
deal with more general matrices, and, in particular, N x N matrices of polynomials. Many 
of the expressions are analogous to those seen Chapter \7\ we went through them in some 
detail for orthogonal iV-channel filter banks, as the biorthogonal ones are similar. 

General, unstructured ./V-channel filter banks are rarely seen in practice; instead, 
iV-channel modulated filter banks are widespread because (1) of their close connection 
to local Fourier representations, (2) computational efficiency, (modulated filter banks are 
implemented using FFTs), and (3) only a single prototype filter needs to be designed. 

We studied uniformly-modulated filters bank using both complex exponentials and 
as well as cosines. The former, while directly linked to a local Fourier series (indeed, when 
the filter length is N, we have a blockwise DFT), is hampered by a negative result, Balian- 
Low theorem, which prohibits good orthonormal bases. The latter, with proper design of 
the prototype filter, leads to good, orthonormal, local cosine bases (LOTs). These are 
popular in audio and image processing using a prototype filter of length L — 2N. 

To showcase their utility, we looked at the use of complex exponential-modulated 
filter banks in power spectral density estimation, communications (OFDM) and transmul- 
tiplexing (Wi-Fi), as well as that of cosine-modulated ones in audio compression (MP3). 
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N-Channel Filter Bank 



Block diagram 



Sjv-i 



.90 



An ) ° N H T-V) 



An) - (fvj 



9n-i 



.90 



O 4 



Basic characteristics 



number of channels 


M = N 








sampling factor 


N 








channel sequences 


Cti,n 


i = 0,l,. 


.,7V- 


-1 


Filters 


Synthesis 


Analysis 






orthogonal filter i 


9i,n 


gi- n i = 0, 1, . 


.,/V- 


-1 


biorthogonal filter i 




9i,n 






polyphase component j 


9i,j,n 


9i,j,n j = 0, 1, . 


.,N - 


- 1 



Table 8.5: iV-channel filter bank. 



Local Fourier Modulated Filter Bank 



Filters Modulation 

complex-exponential cosine 

9i,n PnW~ m p„ cos ( — (i+ -)n + 6 l 



Gi(z) P(W' N z) 






Gi(e? U ) P(e J ("-< 2,r / JV ' i ') - [ e J«ip( e J("-( 2 ' r / 2JV )(»+ 1 /2))) + e -jeip( e j(" + (27r/2Ar)(i+l/2)j 



Table 8.6: Local Fourier bases with complex and cosine modulation. 
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TV-Channel Orthogonal Filter Bank 



Relationship between filters 

Time domain (9i,n, 9j,n-Nk)n = &i-j&k 

Matrix domain DnGj G{Un = Si— j 

N-l 

z domain ^ G l (W N z)G J (W- k z' 1 ) = NSi-j 

k = 
N-l 

DTFT domain ]T G i (eH u -W f '> k ))G j {W%e~i") = N8, 



k=0 

N-l 

Polyphase domain 2_. Gi^{z)Gj^{z] = Si- 

fc=o 

Basis sequences Time domain 

{9i,n-2k}i = 0,...,N-l,k£l, 



Frequency domain 

{Gi(*)}i=o,....Ar-i 



Filters 


Synthesis 




Analysis 




g t , n , Gi(z), Gi{e? w ) 


9il - n , G^z- 1 ), Gi{er>») 


Matrix view 


Basis 




Time domain 


* [• • ■ 90,n-2k 


9l,n-2k ■■■ S0AT-l,n-2fc| 






\Go(z) 


Gi(z) ... G N -i(z) 



z domain 



DTFT domain 



*(*) 



$(e J ") 



G (W N z) Gi(W N z) 



GoiW^-'z) GiiW^z) 
G (e^) Gi(e^) 

G (WW") Gi(W N e^) 



G N -i(W N z) 



G%ZI(Wnz\ 



Giv-i(e^) 
G N -i(W N e^) 



Polyphase domain &p(z) 



Go(W^-V w ) GiCW^-V") 

G ,o{z) Gi, (z) ■•• Gjv-i,o(«) 

G ,i{z) Gi,i{z) ... G]v-i,i(2) 



GZz\l.W N e>»\ 



G ,jv-i(z) Gi,jv-i(«) ■•• Gat-i,jv-i(z). 



Constraints 

Time domain 

z domain <T>(z -1 )* <&(z) = 7 

DTFT domain <J>*(e- J ") <S>(e> w ) = I 

Polyphase domain <J>*(z _ )<J? p (z) = 7 



Orthogonality relations Perfect reconstruction 
$*$ = / $$*=7 



*(2)$*(z _1 ) = 7 
$(&? a, )$*(e _ *"') = 7 

$ p (z)$;( z - 1 ) = 7 



Table 8.7: Properties of an orthogonal iV-channel filter bank. This table is the JV-channel 
counterpart to the two-channel Table 7.91 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



666 Chapter 8. Local Fourier Bases on Sequences 

Historical Remarks 

The earliest application of a local Fourier analysis was by Dennis Gabor to the analysis 
of speech [55] . The idea of a local spectrum, or periodogram, was studied and refined by 
statisticians interested in time series of sun spots, floods, temperatures, and many others. 
It led to the question of windowing the data. Blackman, Tukey, Hamming, among others, 
worked on window designs, while the question of smoothing was studied by Bartlett and 
Welch, producing windowed and smoothed periodograms. 

MP3 For compression, especially speech and audio, real mod- 

^_ ^_ ^m^^^X ulated local Fourier filter banks with perfect or almost perfect 

m\jt ^^*F reconstruction appeared in the 1980's and 1990's. Nussbaumer, 

^^___^_^_,^^ Rothweiler and others proposed pseudo-QMF filter banks, with 

^^Ki^^^l^L^^^ nearly perfect reconstruction, frequency selective filters and high 

computational efficiency. This type of filter bank is used today 

in most audio coding standards, such as MP3. A different approach, leading to shorter 

filters and LOTs, was championed by Malvar, Princen and Bradley, among others. These 

are popular in inrage processing, where frequency selectivity is not as much of a concern. 



QfFJJ 



Wi-Fi Frequency division multiplexing has been a popular com- 
munications method since the 1960's, and its digital version led to 
complex exponential-modulated transmultiplexers with FFTs, as 
proposed by Bellanger and co-workers. That perfect transmul- 
tiplexing is possible was pointed out by Vetterii. Multicarrier 
frequency signaling, which relies on efficient complex exponential-modulated transmulti- 
plexers is one of the main communications methods, with orthogonal frequency division 
multiplexing (OFDM) being at the heart of many standards (for example, Wi-Fi, 802.lt). 

Further Reading 

Books and Textbooks For a general treatment of iV-channel filter banks, see the books 
by Vaidyanathan [158] , Vetterii and Kovacevic [167], Strang and Nguyen [143] , among 
others. For modulated filter banks, see [158] as well as Malvar's book [101] , the latter with 
a particular emphasis on LOTs. For a good basic discussion of periodograms, see Porat's 
book on signal processing [114] , while a more advanced treatment of spectral estimation 
can be found in Porat's book on statistical signal processing [113] , and Stoica and Moses' 
book on spectral estimation [138]. 

Design of V-Channel Filter Banks General iV-channel filter bank designs were in- 
vestigated in [156] 157] . The freedom in design offered by more channels shows in ex- 
amples such as linear-phase, orthogonal FIR solutions, not possible in the two-channel 
case [168] [136] . 



Exercises with Solutions 

8.1. Orthogonal Bases for l 2 (Z) 

Let Cs be the set of sequences in £ 2 (Z) that are constant over intervals of size 3, that is, a 
sequence x n belongs to C3 if 

x 3k = x 3k + l = x 3k + 2, 
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for all k e Z. 

(i) Prove that C 3 is a subspace of t 2 (Z). 

(ii) Find an orthonormal basis for C 3 , that is, describe the set of basis vectors, prove 
that the set is orthonormal and prove that any element in C 3 can be written in 
terms of those vectors, 
(iii) Using filtering, upsampling and downsampling, show how to implement an orthog- 
onal projection from £ 2 (Z) to C 3 . 
(iv) Construct an orthogonal filter bank such that the projection operator from the 
previous part corresponds to one of the branches of the analysis/synthesis banks. 

Solution: 

(i) To prove that C 3 is a subspace we need to prove the following: 

(a) If x, y are in C 3 , then z = x + y is in C3 as well. For i = 0, 1, 2, 

z 3 k = x 3k + y 3k = x 3k+i +y 3k+i = z 3k+i . 

(b) If x is in C3, then 2 = ax, where a is a real or a complex number, is in C3 as 
well. For i = 0,1,2, 

z 3k = <^x 3k = o^3fc+» = z 3k+i . 

(ii) Define the set of basis vectors as {<p ktn = </? n -3fc I n,k S Z}, with tp„ = 1/V3 for 
n = 0, 1, 2, and otherwise. 

(a) To prove that the set is orthonormal, write 

(<Pk,n,<Pl,n) = 2-^,¥k,n<£l,n = / . 9n-3k<4>n-?,l ■ 
n n 

In the previous sum, <p n - 3k is nonzero for n = 3k, 3k + 1, 3k + 2, while tp n _ 3 i 
is nonzero for n = 31,31 + 1,31 + 2. Thus 

{fk,n:fl,n) = &k-l- 

(b) To prove completeness, we have to show that any element in C 3 can be written 

En = y^ 1 {<Pk,l,Xl)<Pk,n- 



as 

Xn 

k 

The transform coefficients are given by 



(<Pk,l,%l) = -7=(x3k +x 3k+1 +x 3k+2 ). 
Write now n = 3m + i, with i = 0, or 1 or 2. Then 

x n = ~J= y X X 3k + ^3fc + l + x 3k + 2)<P3(m-k) + i- 
VS k 

Now, v?3( m _fc) is nonzero only for q = 3(m — k) + i equal to 0, or 1 or 2. Since 
3(m — k) = q — i, we can check all possible combinations of q, i and just take 
those that for q — i give an integer multiple of 3. This leads to q = i, or k = m, 
and thus 

1 / n 1 

Xn = — 7=(X 3m +X 3m+1 +X 3m + 2 )— =. 

Since one of the elements x 3m or x 3m+ i or x 3m+ 2 is equal to x n , and x n 
belongs to C 3 , that means that x n = x 3m = x 3m +\ = x 3m +2 and thus 

1 

x n — — ■ o ■ x n — x n . 
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(iii) We know that the orthogonal projection is 

k 



,Xl)lpk 



where {tpk,l\ is an orthonormal basis for C 3 . We use the orthonormal basis for C 3 
we just constructed. That is, choose i/ij. = tp^ as our basis functions. We can check 
that the output x n is indeed in C 3 . For i = 0, 1,2, 



1 



1 



X 3 n 



r3 -X 3r , 



£37! 



-(X3m + x 3m + i + x 3m + 2 ) 

If we denote g n = (fin + <5 n -i + S n - 2 )/V3, we can easily implement this orthogonal 
projection operator as filtering by g~ n , downsampling by 3, upsampling by 3 and 
filtering by g„ . 
(iv) We construct a 3-channel filter bank with sampling by 3 where the synthesis lowpass 
filter is given by gn, n = fn = l/\/3 for n = 0, 1, 2 and otherwise. Thus, we know 
the first row of the analysis polyphase matrix. We can use the simplest 3-channel 
orthogonal matrix given ( [8.15a} . 



*pW 



Since we know the first row, we know that cosoa = l/\/3. Therefore, sinai = 
V2/V3. From the second element in the first row, we get that — sinaisinaj2 = 
l/v3. Thus, sina2 = — l/v2 and finally cosc«2 = ±l/v2- Choose a "+" sign. The 
final polyphase matrix is 



COSQl 


sinai 




"l 











1 







cos «2 sin a.2 


- sm a\ 


cos a\ 







— sin Ct2 COS C»2 


cos 01 


— sinai sin«2 


sinai cos a 2 







C0SQ2 


sinc»2 




- sm a\ 


— cosai sm«2 


cos ai cos a.2 





**(*) 



1/V3 


-^2/^3 



l/\/3 
1/V2 
1/V6 



1/V3' 
-1/V2 

i/Ve 



It is easy to check that <& p (z)<I>J(z 1 ) = I. 

8.2. Factorization of Complex Exponential- Modulated Modulation Matrices 

Given is a 3-channel complex exponential- modulated local Fourier filter bank as in ( |8.16[ ). 
We call the following circulant matrix: 



*m(z) 



G (z) 
G (W 3 z) 



the modulation matrix. 



Gi(z) 

Gi(VF 3 z) 

Gi(Wlz) 



G 2 (z) 

G 2 (W 3 z) 
G 2 (Wlz) 



G(z) 
G(W 3 z) 

G{Wlz) 



G(W 3 z) 
G(W§z) 

G(z) 



G(Wlz) 

G(z) 

G(W 3 z) 



(i) Find the relationship between the modulation matrix <E> m (z) and the polyphase 

matrix <E> p (z). 
(ii) Show how to diagonalize <& m (z). 
(iii) Give a form of the determinant det(<& m (z)). 

Solution: The gist of this problem is to solve (i)| that is, find the relationship between 
the polyphase and modulation matrices. 

(i) We start with N = 2, as it is easier to see the relationship between the polyphase 
and modulation matrices in that case: 



X(z)+X(-z) 
X(z)-X(-z) 



2£ 



•<-2n 



2X (z 2 ), 



nez 
2z _1 ^2 x 2n+1 z~ 



2z _1 Xi(z 2 ), 
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which can be compactly represented as 



X p (z 2 ) 



\X (z)l 


1 


"l 






"l 


1 




'X(z) - 


1 


"l 




[Xi(z)\ 


2 




z 




1 


-1 




X{-z)\ 


2 




z 



FX m (z). 



F X m (z) 

In other words, the modulation and polyphase domain representations are related 
via the DFT. We can easily generalize this to arbitrary TV to obtain: 

X P (z N ) = ^diag([l z ... z N ^])FX m (z), 



*p(**) 



diag([l z 



'])«»(4 



(E8.2-la) 
(E8.2-lb) 



where F is the DFT matrix from (|2.161aJ |. 
(ii) Since the second and third filters are modulates of the first one, we can rewrite the 
polyphase matrix as 



*pW 



'G ,o(z) Gi,o(z) G 2 ,o(z) 
Go,i(z) Gi A (z) G 2 ,i(z) 

Go,2(z) Gl, 2 (z) G 2 , 2 (z) 



x) 



Po(z) P (z) P (z) 

Pi(z) W 2 Pi(z) WPi(z) 
P 2 (z) WP 2 (z) W 2 P 2 (z) 



= diag([Po(*) Pi(z) P 2 {z)})F\ 
where (a) follows from (18-111) (this was also derived in (18-17] )) . Using this and 
(E8.2-lb[ ), we have that 

diag([P 3 ) Pi(z 3 ) P 2 (z 3 )])F* = -diag([l z z 2 ]) F* m (z), 



and 



■diag([l z z 2 ])F<S> rn (z)(F*)- 1 = diag([P (2 3 ) Pl(* 3 ) P 2 (z 3 )}), 



or, finally 



F$ m (z)F = 9diag([P ( 2 3 ) z^P^z 3 ) z- 2 P 2 (z 3 )}), 



where we have used the expression for the adjoint of F in (]2,161b)) . 
(iii) Using the factorization from |(ii)| 

det($ m (z)) = det(9P" 1 diag([P (z 3 ) z-^P^z 3 ) Z - 2 P 2 (z 3 )]) F" 1 ) 

= ^detCP" 1 ) 2 ^ 3 l\P 3 {z 3 ) ( = } 9 3 (V3) 2 z- 3 l\Pj(z 3 ) 

= 3 7 2 - 3 np, ( , 3 ). 

where (a) follows from ( [2.162| ), 

8.3. Discrete Cosine Transform 

The DCT is a block transform, where each basis sequence is a modulated version of an 
TV-point average. One reason for its popularity is that the DCT matrix can be recursively 
factored in a fashion similar to the DFT; it can also be computed with a DFT of the same 
length and simple pre- and post-processing. 

There are eight versions of the DCT; here, we give the basis sequences of the most 
well-known one, DCT-2: 

1 



V>0,n 



90,n 



Pn 



(E8.3-la) 



for i = 1,2, ... ,TV - 1, and n 
basis for R^, by proving that: 

Do this in steps: 



9i,n 

= 0,1, 



o„v / 2cos(^(n+-)ij 



PnV^cos ( 7777(n + —)i J , (E8.3-lb) 

. . , TV — 1. Prove that the DCT is an orthonormal 



(•Pi, fl) = &i-i- 
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go, n 

1111111, 



<?4,n 



01,n 
55, 



Tl 



/2,n 


33, n 


'llll " 


kA 



96, n 37, n 

I |* Y I Y Y " i I y* " 

(a) 



|G„(e*")|,|Gi(e*' , )|,...,|G7(0| 




(b) 

Figure E8.3-1: DCT for JV = 8. (a) The eight basis sequences, (b) Magnitude responses 
of the basis sequences, showing the uniform split of the spectrum. 



(i) Prove that (tp it ip Q ) = 0, for i = 1,2, . .., JV" - 1. 
(ii) Prove that {<p it ip e ) = 0, for i ^ £, i,t = 1, 2, . . . , JV - 1. 
(iii) Prove that ||i^i|| = 1 for i = 0, 1, . . . , JV — 1. 

For JV = 8, numerically compare the DTFT magnitudes of the DCT basis sequences to 
those of the LOT basis sequences from (|8.30| ) . 

Solution: The first three parts show that the DCT is an orthonormal basis for R^ . 
(i) We show that all the basis sequences are orthogonal to the prototype one: 

/77 N ~ 1 

(a) V2 



(a) V2 v-^ ( 27T -i , X 

(ipi, ipn) = > cos iln H — 

\rt, yo/ TV ^ I 2JV v 2 



0) 



n = 

JV-1 



— V (w' (2n+1) + w-' (2n+1) ) 



(c) 



V^JV 



v^jv 



1 



WX N J2 w¥n + W$ J2 W 4n" 



i 1-W-fj^ ,1-VF.-^ 



V2JV Y' iN 1-W& 



i-w: 



4iV -*- rr 4iV 

where (a) follows from (JE8.3-1)) ; (b) from (|2.275[ ); and (c) from the finite-sum formula 
Pl.65-1) . For i even, Wffi 
-1, and the above becomes 



([PI. 65-11) . For i even, W^ff = 1, and the inner product is zero. For i odd, W^ffl 



(Wi •Po) 



V2 Wl N (l - w-/) + w-^(i - Wi' N ) 



N 



(I - W^ N )(l - w- 2 ') 



0. 
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(ii) The argument follows a similar line as in (i)| so we sketch it only: 

2 ^ f 2-k .. 1. \ /2tt „. 1, 

(ifii, Wf) = — > cos itn H — ) cos tin H — ) 



N ^r \2N v 2'/ V2AT v 2 

AT-1 

— V (V i(2n+1) + w- i(2n+1) ^ (V (2n+1) + w -' (2n+1) 

2jy 2—, V 4iV ' KK 4iV I \ vv AN vv AN 



1 = 



V n = n=0 



JV-1 JV-1 

Y^ i^2(*-0»_,_w-«-<) Y^ W~^ (i ~'" 



W {i ~ l) V w 2(i "^ )n + W" (8_<) V W~ 

VV AN 2—i VV AN ^ VV AN 2—i VV 4 



The first two terms are equivalent to the problem solved in (i) with i' = i + £; the 
same is true for the last two terms, just with i' = i — £, proving that the inner 
product is zero, 
(iii) That ipo is of unit norm is obvious. For tpt, i = 1,2, . . . ,N — 1, we have that 



N-l 



<*«>*«> = 5 i F £(wi2" + " + ir "-'" ' 



2JV ^— ' V 4iV T"4iV 

n = 
/N-l N-l N-l 

<<*> <*> = i ( E ^ 2n+1) + E w 4 -/ (2n+1) +2 E « 

\n=0 n = n = 

The first two sums are zero by the same argument as in (i)| The third sum equals 
N, showing that the norm of <pi is indeed 1. 



The DCT basis sequences as well as their magnitude responses are given in Figure [ES. 3- II 
Compare this to their LOT counterpart in Figure 18.141 
;.4. Complexity of a Cosine- Modulate Local Fourier Filter Bank for Audio Compression 

Like the MPEG filter bank in Example 18.51 a 32-chann el fi lter bank with filters of length 
512 as in (OS) is used in a standard called MUSICAMl^ This filter bank can be imple- 
mented using polyphase filters and an FFT. Assuming an input sampling rate of 44.1 kHz, 
give the number of operations per second required to compute the filter bank. 

Solution: The complexity of computing the analysis filter bank is the same as that of 
computing the synthesis filter bank. We use (18.441) with M = 32, L = 512, and thus 
K = M/L = 16. The number of operations per sample equals 

L 512 

2 h21og 2 M = 2 h21og 2 32 = 2(16 + 5) = 42. 

M 32 

With 44,100 samples per second, this amounts to 1,852,200 operations per second. 



Exercises 

8.1. Projection Operator 

Given is the system in Fig. |P8.Tl| with (a n , a n -2fe) = ^fci {°n, &»— 3fe) = 8^. 
(i) Is P in xi = Px a projection? Why? Prove it. 

(ii) Redraw the above block-diagram as below and give expressions for c n ,d n , M and N 
in any domain convenient. 



125 Actually, in a real MUSICAM system, the modulation is with cosines and the implementation 
involves polyphase filters and a fast DCT, and is thus very similar to the complex case we analyze 

here. 
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Figure P8.1-1: System for Exercise [87fl 



(iii) If a n = (<5 n + 8 n -i)/\/2 and b n = (8 n + <5„_i + <5 n _2)/v'3, what are c n and d n ? 
(iv) For a n and b n as in |(iii)| does the following hold: 

{c n ,c n _ 6k ) = 4? (P8.1-1) 

(v) For general a n and b„ as in |(ii)| does (IPS. 1-1] ) hold? 

.2. Biorthogonal Relations 

Prove that the system shown in Figure |P8. 2-1 1 begin identity is equivalent to the following: 

{dNk-n, 9n-Nl) = S k _i. 



"©- 



-@" 



Figure P8.2-1: System for Exercise S3] 



;.3. Biorthogonality in N '-Channel Perfect Reconstruction Filter Banks 
Given is the modulation matrix & m (z): 



*m(z) 



G (z) 
G {Wz) 



Gi(z) 
Gi(Wz) 



G 2 (Wz) 



Gjv-i(z) 
G N -i(Wz) 



Go{W N ~ 1 z) G 1 (W N ~ 1 z) G 2 (W N - 1 z) 



G N _ 1 {W«-'z)\ 

(P8.3-1) 
and similarly for the modulation matrix on the analysis side, <& m (z). Find the perfect 
reconstruction condition in terms of the modulation matrices <£> m and $ m . 



1.4. Complex Exponential- Modulated Local Fourier Basis with Ideal Filters 

Consider an ideal JVth band filter G from Table 2.51 and its modulations as in (18.16) ) . 

(i) Prove that the impulse responses and their shifts by multiples of N, 

{ffi,n-feiv}fcez,«e{o,i,...,j\r-i}i 
form an orthonormal set. 
(ii) Prove that all filters are modulates of the prototype filter p, following fl8.16| ), both 
in time and frequency domains. 

..5. Conditioning of Complex Exponential- Modulated Local Fourier Filter Banks 

Given is a complex exponential-modulated local Fourier filter bank with filters G, as 
in (]8.161) . and their polyphase components G;j, i, j = 0, 1, . . . , JV— 1, as in (]8.12d|) . 
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(i) Take N = 2, where the two filters are Gn(z) and Gi(z) = Gq(—z). Show that the 
conditioning of the filter bank is given by the conditioning of the matrix 

(e } ~ [ |G 0il ( e ^)| 2 J ' 

where, by conditioning, we mean that we want to find bounds a and j3 such that 

iV-l 

<*iwi 2 < Yl H q *ii 2 - ^ii^ii 2 ' 

i=0 

where the on = {x, g^ n —ik) are the N channel signals. 

(Hint: Take a fixed frequency luq and find cv(ajn) and (3(ll>q). Then extend the 
argument to u) £ [— 7r,7r).) 
(ii) Compute the bounds a and /3 for the following filters: 



Filter Go(z) 



Haar ^( 1 + z_1 ) 



Ideal halfband 



s/2, |w| < tt/2; 
0, otherwise. 

4-point average ~ (1 + z _1 + z~ 2 + z~ 3 ) 

Windowed average -J- (1 + z -1 + z~ 2 + ^z~ 3 ) 

(iii) Extend the argument to general N. 

(iv) Numerically compute a and /3 for the triangular windows below: 

N Gq(z) 

4 1 + 2Z' 1 + 3z~ 2 + 4z" 3 + Az~ A + 3z~ 5 + 2z~ 6 + z~ 7 

3 1 + 2Z" 1 + 3z~ 2 + 4z" 3 + 5z~ 4 + 4z" 5 + 3z~ 6 + 2z~ 7 + z~ 8 

;.6. Block-Based Power Spectral Density Estimation of White Noise 
Consider a white noise process, or x n is i.i.d. with variance o 2 . . 

(i) Show that the entries of B n in (18.21) ) are i.i.d. with variance cr 2 , irrespective of the 

size M of the DFT. 
(ii) Show that the average power spectrum (18.25) has variance (l/K)o 2 . 

1.7. Power Spectral Density Estimation Using Windowing 

Consider windows p of odd size M, centered at the origin (such windows are not causal, 
but can be made so with a right shift by (M — l)/2). For numerical evaluations and plots, 
use M = 31. 

(i) Rectangular window, or, the box sequence from (12.13a) ): 



M = f l/VM, |n|<(M-l)/2; 
{ 0, otherwise. 

Calculate its DTFT, the width of the main lobe (defined as the distance between 
zeroes of the DTFT around the origin) and the height of the tallest side lobe. Note 
that the DTFT of pM is the Dirichlet kernel of order (M - l)/2. 
(ii) Triangular (Bartlett) window: 

W = f l-(M + l)/2, \n\ < (M - l)/2; (pg ? _ 2) 

(^ 0, otherwise. ^ ' ' 

Calculate its DTFT, the width of the main lobe and the height of the tallest side 

lobe. 

(Hint: The triangular window can be seen as the convolution of a rectangular window 

of size (M + l)/2 with itself.) 
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(iii) Comparing the two windows above, p' r ' and p'", show that, from [(!)] to |(ii)| the 

main lobe doubles while the side lobe is halved (in dB). 
(iv) Hamming window: 

p W = / 0.54-0.46 008(^(7!-^)), |n| < (M - l)/2; (pg 73) 

[ 0, otherwise. 

Calculate its DTFT, the width of the main lobe and the height of the tallest side 
lobe, 
(v) Verify that the Hamming window can be written in frequency domain as the sum 
of three Dirichlet kernels, one at the origin and weighted by 0.54, and the other two 
weighted by 0.23, shifted by ±2-7r/(M — 1). From this, give the expression for the 
width of the main lobe. 

8.8. Parametric Spectral Estimation 

Consider signals consisting of complex sinusoids and additive white Gaussian noise. To 
detect a sinusoid, one takes a windowed DFT and looks for maxima in the squared mag- 
nitude. 

(i) Compare qualitatively the rectangular window p( r > from (P8.7-1) and the triangular 
window pO from ( |P8.7-2[ ). for the following cases: 

(i) Single complex sinusoid in low versus high noise, 
(ii) Two closely spaced complex sinusoids in low versus high noise, 
(iii) Take M = 64 and generate sequences with one or two sinusoids, and noise levels of 
the order of the largest side lobe of the rectangular and triangular windows p( r ) and 
p' 4 ' (see Problem 18.7) . Compare the detection of sinusoids in these various cases. 

8.9. Transmultiplexers 

For the transmultiplexer shown in Figure 8.101 verify the following for N = 3 and TV = 4: 

(i) If G {z) = G (z) = (l/v A /V)(l + z- 1 + z~ 2 + . . . + z~ N+1 ) and the filter bank is 

modulated as in ( [8. 16} . then the system is perfect reconstruction, 
(ii) If Go(z) = Go(z) are ideal TVth-band filters and the filter bank is modulated as in 

( [8,16[ ). then the system is perfect reconstruction, 
(iii) If Go(z) is of finite length (but larger than N), and the filter bank is modulated as 
in ( [8.16) , derive the input-output relationship in terms of the polyphase components 
of Gq(z) and Go{z). 
(Hint: Use the factorization analogous to the one in (18.17) ). 
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9.8 Introduction |tX0| 

If the projection of the signal onto two subspaces is advantageous, projecting 
onto more subspaces might be even better. These projections onto multiple sub- 
spaces are implemented via multichannel filter banks, which come in various flavors: 
For example, there are direct multichannel filter banks, with TV filters covering the 
entire spectrum, their outputs downsampled by N , covered in Chapter [8J There 
are also tree-structured multichannel filter banks, where a two-channel filter bank 
from Chapter \7\ is used as a building block for more complex structures. While we 
will discuss arbitrary tree structures later in this chapter, most of the chapter deals 
with a particularly simple one that has some distinguishing features, both from 
mathematical as well as practical points of view. This elementary tree structure 
recursively splits the coarse space into ever coarser ones, yielding, in signal process- 
ing parlance, an octave-band filter bank. The input spectrum (subspace) from to 
■k is cut into a highpass part from 7r/2 to n, with the remainder cut again into 7r/4 
to 7r/2 and a new remainder from to 7r/4, and so on. As an example, performing 
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the split three times leads to the following 4-channel spectral division: 



7T\ [~7T 7T\ [~7T 7T \ [~7T \ 



°' 8/ ' L8'4/ ' L4' 2. 

yielding a lowpass (coarse) version and three bandpass (detail) versions, where each 
corresponds to an octave of the initial spectrum, shown in Figure HOT c) 1 126 l 

Such an unbalanced tree-structured filter bank shown in Figure \9A\ is a central 
concept both in filter banks as well as wavelets. Most of this chapter is devoted to 
its study, properties, and geometrical interpretation. In wavelet parlance, when the 
lowpass filter is designed appropriately, the filter bank computes a discrete wavelet 
transform (DWT). Even more is true: the same construction can be used to derive 
continuous-time wavelet bases, and the filter bank leads to an algorithm to compute 
wavelet series coefficients, the topic of Chapter [121 



9.1 Introduction 

The iterated structure from Figure 19.1( a) clearly performs only the analysis oper- 
ation. Given what we have learned so far, the channel sequences /3' 1 ), fi^ 2 ' , j3^\ 
a'- 3 ' compute projection coefficients onto some, yet unidentified, subspaces; it is 
left to establish which expansion has these as its projection/transform coefficients. 
Moreover, we should be able to then express the entire expansion using filter banks 
as we have done in Chapter [71 It is not difficult to see, that the synthesis filter bank 
corresponding to the analysis one from Figure 19.1( a), is the one in Figure 19.1( b). 
Every analysis two-channel block in Figure 19.1( a) has a corresponding synthesis 
two-channel block in Figure 19.1( b). We can thus use the whole machinery from 
Chapter \7\ to study such an iterated structure. Moreover, the example with J = 3 
levels can be easily generalized to an arbitrary number of levels. 

As we have done throughout the book, we introduce the main concepts of 
this chapter through our favorite example — Haar. Building upon the intuition we 
develop here, generalizations will come without surprise in the rest of the chapter. 



Implementing a Haar DWT Expansion 

We start with a 3-level iterated filter bank structure as in Figure 19.11 where the 
two-channel filter bank block is the Haar orthogonal filter bank from Table 7.81 
with synthesis filters 

G(z) = ^(I + jT 1 ), H{z) = ^(1-z- 1 ). 



126 Another interpretation of octave-band filter banks is that the bandpass channels have constant 
relative bandwidth. For a bandpass channel, its relative bandwidth Q is defined as its center 
frequency divided by its bandwidth. In the example above, the channels go from n/2 t+1 to tt/2' l , 
with the center frequency 3-7r/2* +2 and bandwidth 7r/2 l+1 . The relative bandwidth Q is then 3/2. 
In classic circuit theory, the relative bandwidth is called the Q-factor, and the filter bank above 
has constant-Q bandpass channels. 
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Figure 9.1: A two-channel orthogonal filter bank iterated three times to obtain one 
coarse subspace with support [0,7r/8), and three bandpass subspaces. (a) Analysis filter 
bank, (b) Synthesis filter bank, (c) The corresponding frequency division. 



Equivalent Filters As mentioned earlier, we now have four channel sequences, /?W , 
fit 2 ', j3^' , a' 3 ), and thus, we should be able to represent the tree structure from 
Figure 19.1 as a 4-channel filter bank, with four channel filters and four samplers. 
This is our aim now. 

We first consider the channel sequence a^ 3 ' and its path through the lower 
branches of the first two filter banks, depicted in Figure [9721 (a) . In part (b) of the 
same figure, we use one of the identities on the interchange of multirate operations 
and filtering we saw in Chapter[2j Figure d. 221 to move the first filter G(z) across the 
second upsampler, resulting in part (c) of the figure. In essence, we have compacted 
the sequence of steps "upsampler by 2 — filter G(z) — upsampler by 2 — filter G(z)" 
into a sequence of steps "upsampler by 4 — equivalent filter G^ 2 ' (z) = G(z)G(z 2 )". 
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-@-\ G(z) 



G(z) 



G(z) \~(p)- 



(a) 



(b) 



<±3>- 



G 2 



(H)-| G(z)Gjz^) 



(c) 



Figure 9.2: Path through the lower branches of the first two filter banks in Figure [9TTI 
(a) Original system, (b) Use of one of the identities on the interchange of multirate 
operations and filtering from Figure 2.22 results in moving the filter across the upsampler 
by upsampling its impulse response, (c) Equivalent system consisting of a single upsampler 
by 4 followed by an equivalent filter G (2) (z) = G(z)G{z 2 ). 



We can now iteratively continue the process by taking the equivalent filter and 
passing it across the third upsampler along the path of the lower branch in the last 
(rightmost) filter bank, resulting in a single branch with a single upsampler by 8 
followed by a single equivalent filter G^ 3 ' (z) = G{z)G{z 2 )G(z i ), resulting in the 
lowest branch of Figure 19.31 

Repeating the process on the other three branches transforms the 3-level tree- 
structured synthesis filter bank from Figure 19.1( b) into the 4-channel synthesis filter 
bank from Figure 19.31 with the equivalent filters: 



HW(z) = G{z)H{z 2 ) = \{l + z- l ){l-z-*) 

= \{l + z- 1 -z- 2 -z- 3 ), 
H^{z) = G(z)G(z 2 )H(z i ) = ^(l + z -i)(l+ z - 2 )(l-z- A ) 



i(l + 2 -i + z -2 + z -a _ z -i _ z -s _ z -e _ z - 7) 



2V2 
G (3) (z) = G(z)G(z 2 )G(z 4 ) 



1 (1 + *-!)(! + *- 2 )(l + *- 4 ) 



2V2 



5^(1 + z- 1 + z- 2 + z- 3 + z- A + z- 5 + z- e + z- 7 ) 



(9.1a) 
(9.1b) 
(9.1c) 
(9.1d) 



If we repeated the above iterative process J times instead, the lowpass equivalent 
filter would have the ^-transform 



G< J >(* 



,7-1 
l[G(z 

1=0 



2-V 



1 2" -1 



n=0 



that is, it is a length- 2 J averaging filter 



g(J) 



2-V 



I 2 - 1 
772 X, S " 



(9.2a) 



(9.2b) 



k=0 
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0(1) (g)_ H^=H{z 



a 



(3). 



p(2) (TA_ H<® = G{z)H{z 



p(3) (g)_ H (3) = G(z)G(z 2 )H(i 



(f?)— G< 3 ' = G{z)G{z 2 )G{z 




Figure 9.3: Equivalent filter bank to the 3- level synthesis bank shown in Figure [9TTT b) . 

while the same-level bandpass equivalent filter follows from 

HW{z) = H^'-^G^-^iz) = ^(g^ j - 1 Hz)-z- 2J ~ 1 G^- 1 \z)), (9.2c) 
with the impulse response 



M J) 



2 J / 2 



^ Sn-k - J^ 5 n -i 



(9. 2d) 



fe=0 



fe=2- 7 " 1 



Basis Sequences As we have done in Chapter \7\ we now identify the resulting 
expansion and corresponding basis sequences. To each branch in Figure 19.31 corre- 
sponds a subspace spanned by the appropriate basis sequences. Let us start from 
the top. The first channel, with input (j^ 1 ' , has the equivalent filter hP~' = h, just 
as in the basic two-channel filter bank, (7.10b) , with upsampling by 2 in front. The 
corresponding sequences spanning the subspace W^ 1 ' are (Figure [9.4( a)): 



VF« = 5pa5({/£i 2fc } fcez ) 



(9.3a) 



The second channel, with input /3^ 2 ', has the equivalent filter ( |9.1bj ) with upsam- 
pling by 4 in front. The corresponding sequences spanning the subspace W^ 2 ' are 
(Figure 1931(b)): 

W^ = span({^ 2 2 4 J fceZ ). (9.3b) 

The third and forth channels, with inputs /3' 3 ' and cr 3 ', have the equivalent filters 
(9.1b) , (9.1b) , respectively, with upsampling by 8 in front. The corresponding 
sequences spanning the subspaces W^' and V^ 3 > are (Figure [931 (c), (d)): 



W& = span({4 3 2 8 J feeZ ), 



(9.3c) 
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Figure 9.4: Discrete-time Haar basis. Eight of the basis sequences forming $0: (a) level 



1, h„ and three of its shifts by 2, (b) level £ — 2, /in and one of its shifts by 4, (c 



(2) 



,(3) 



level £ = 3, /in° ; , and (d) level £ = 3, g n ■ A basis sequence at level i is orthogonal to a 
basis sequence at level j, i < j, because it changes sign over an interval where the latter 
is constant (see, for example, the blue basis sequences). 



V™ = span({ 5 i 3 2 8 J fceZ ). 
The complete set of basis sequences is thus: 

$ _ rt(l) *,(2) l(3) 0) , 

w ~~ t"'n-2fc'"'n-4fc'' l n-8fc'yn-8fcJ 



(9.3d) 



(9.3e) 



Orthogonality of Basis Sequences While we have called the above sequence basis 
sequences, we have not established yet that they indeed form a basis (although this 
is almost obvious from the two-channel filter bank discussion). 

The sets spanning 14 7 ' 1 ', W^ 2 \ W^ and V^ are all orthonormal sets, as the 
sequences within those sets do not overlap. To show that $ is an orthonormal set, 
we must show that sequences in each of the above subsets are orthogonal to each 
other. To prove that, we have to show that h^ 1 ' and its shifts by 2 are orthogonal 
to h^ 2 ' and its shifts by 4, h^ and its shifts by 8, and g^ and its shifts by 8. 
Similarly, we must show that hS 2 > and its shifts by 4 are orthogonal to ftA 3 ' and 
its shifts by 8 and g^ 3 ' and its shifts by 8, etc. For Haar filters, this can be done 
by observing, for example, that ft,' 1 ' and its shifts by 2 always overlap a constant 
portion of h^ 2 \ hy> and g^ 3 ' , leading to a zero inner product (see Figure [974] ) . With 
more general filters, this proof is more involved and will be considered later in the 
chapter. 

To prove completeness, we first introduce the matrix view of this expansion. 
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Matrix View We have already seen that while g^ 3 ' and h 1 - 3 ' move in steps of 8, 
/j( 2 ) moves in steps of 4 and h 1 - 1 ' moves in steps of 2. That is, during the nonzero 
portion of g^ 3 ' and h 1 - 3 ', h^ 2 ' and its shift by 4 occur, as well as hP~' and its shifts 
by 2, 4 and 6 (see Figure [9741 . Thus, as in Chapter \7\ we can describe the action of 
the filter bank via an infinite matrix: 



$ = diag($ ), 



(9.4) 



with $q as 
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As before, $ is block diagonal only when the length of the filters in the original 
filter bank is equal to the downsampling factor, as is the case for Haar. The block is 
of length 8 x 8 in this case , since the same structure repeats itself every 8 samples. 
That is, Jv 3 > and g^ 3 ' repeat every 8 samples, hS" 2 ' repeats every 4 samples, while 
/jU) repeats every 2 samples. Thus, there will be 2 instances of hS 2 ' in block $o 
and 4 instances of fv- ' (see Figure [974] ) . The basis sequences are the columns of the 
matrix $ at the center block $o and all their shifts by 8 (which corresponds to other 
blocks $o in $). $ is unitary, as each block $o is unitary, proving completeness for 
the Haar case. As we shall see, if each two-channel filter bank is orthonormal, even 
for longer filters, the orthonormality property will hold in general. 

Projection Properties In summary, the 3-level iterated two-channel filter bank 
from Figure [9711 splits the original space 1 2 {7L) into four subspaces: 

e 2 (z) = v {3) ®w {3) ®w {2) ®w {l \ 



given in fl9.3ap -( [9.3d[ ), again, a property that will hold in general. Figure [975] il- 
lustrates this split, where x g denotes the projection onto V^ 3 ', and x h denotes 
the projection onto W^\ i = 1, 2, 3, while Figure 19.61 shows an example input 
sequence and the resulting channel sequences. The low- frequency sinusoid and the 
polynomial pieces are captured by the lowpass projection, white noise is apparent 
in all channels, and effects of discontinuities are localized in the bandpass channels. 

Chapter Outline 

After this brief introduction, the structure of the chapter follows naturally. First, 
we generalize the Haar discussion and consider tree-structured filter banks, and, 
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Figure 9.5: Projection of the input sequence onto V 3 ' , W , W^ 2 ' and W^ 1 , respectively, 
and perfect reconstruction as the sum of the projections. 
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Figure 9.6: Approximation properties of the discrete wavelet transform, (a) Original 
sequence x with various components. The highpass approximation after the (b) first 
iteration x^ ' , (c) second iteration x h , and (d) the third iteration x h . (e) The lowpass 
approximation x 
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in particular, those that will lead to the DWT in Section 19.21 In Section 19.31 we 
study the orthogonal DWT and its properties such as approximation and projection 
capabilities. Section [9.41 discusses a variation on the DWT, the biorthogonal DWT, 
while Section 19.5 discusses another one, wavelet packets, which allow for rather 
arbitrary tilings of the time-frequency plane. We follow by computational aspects 
in Section 19.61 

9.2 Tree-Structured Filter Banks 

Out of the basic building blocks of a two-channel filter bank (Chapter [7) and an 
Af-channel filter bank (Chapter [8), we can build many different representations 
(Figure 19.71 shows some of the options together with the associated time-frequency 
tilings). We now set the stage by showing how to compute equivalent filters in such 
filter banks, as a necessary step towards building an orthogonal DWT in the next 
section, biorthogonal DWT in Section 19.41 wavelet packets in Section 19.51 We will 
assume we iterate orthogonal two-channel filter banks only (the analysis is parallel 
for biorthogonal and/or N-ch&imel filter banks), and that J times. We consider the 
equivalent filter along the lowpass branches separately, followed by the bandpass 
branches, and finally, the relationship between the lowpass and bandpass ones. 
While we could make the discussion more general, we consider bandpass channels 
to be only those iterated through lowpass branches until the last step, when a final 
iteration is through a highpass one, as is the case for the DWT (iterations through 
arbitrary combinations of lowpass and highpass branches would follow similarly). 

9.2.1 The Lowpass Channel and Its Properties 

We start with the lowpass channel iterated J times, leading to g^ J '. Using the same 
identity to move the filter past the upsampler as in Figure 19.21 a cascade of J times 
upsampling and filtering by G(z) leads to upsampling by 2 J followed by filtering 
with the equivalent filter 

.7-1 
G^iz) = G{z)G{z 2 )G(z 4 )...G(z 2J ~ 1 ) = JJ G{z 2 '), (9.5a) 

e=o 

as shown in Figure [9781 If g is of length L, then g( J > is of length 

L( J ) = (L- 1)(2 J -1) + 1 < (L-1)2 J . (9.5b) 

Moreover, we see that 

G^(z) = G{z)G^~ l \z 2 ) = G^" 1 \z)G{z 2 ''~ 1 ). (9.5c) 

Some other recursive relations are given in Exercise 19.21 

Orthogonality of the Lowpass Filter We can use the orthogonality of the basic 
two-channel synthesis building block to show the orthogonality of the synthesis 
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Figure 9.7: Filter banks and variations together with the corresponding time-frequency 
tilings, (a) Two-channel filter bank (Chapter \7}. (b) iV-channel filter bank (Chapter [8). 
(c) The local Fourier transform filter bank (Chapter |8). (d) The DWT tree (present 
chapter), (e) The wavelet packet filter bank (present chapter), (f) The time- varying filter 
bank. 
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Figure 9.8: Cascade of J times upsampling and filtering, (a) Original system, (b) 
Equivalent system. 
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operator obtained by iterating. From this it must be true that the iterated lowpass 
filter is orthogonal to its translates by 2 J . As in Section 7.2.11 we summarize the 
orthogonality properties here (as the proof is just a straightforward, albeit tedious, 
use of multirate operations, we leave it as Exercise 19.1) : 

MatnxVicw D 2 ., {G^ ) T G^ U 2 ., = I 

(9 { n J \& k ) = h ^ J2t=o 1 G (J) (W 2 k ,z)G^(W-, k z^) = ^ 



— ' El=o 1 \G (J HW 2 k ,en\ 2 = 2J 



(9.6a) 
In the matrix view, we have used linear operators (infinite matrices) introduced in 
Section 2. 7[ (1) downsampling by 2 J , D 2 .j , from ( 12.183) ; (2) upsampling by 2 J , 
U 2 .i , from ( 12.188) ; and (3) filtering by G^ J \ from ( 12.63) . The matrix view expresses 
the fact that the columns of G^ j 'U 2 .j form an orthonormal set. The DTFT version 
is a version of the quadrature mirror formula we have seen in ( 12.208) . This filter is 
an 2 th-band filter (an ideal 2 th-band filter would be bandlimited to \uj\ < ir/2 , 
see Figure [9TlT c) with J = 3). 

Let us check the z-transform version of the above for J = 2: 

Y j G^{W k z)G^{W^ k z- 1 ) 

fe=0 

= G^ {z)G<® (z- 1 ) + G^ (jz)G^ Hz' 1 ) 

+ G^(-z)G^(-z- 1 ) + G^(-jz)G^(jz- 1 ) 

= G(z)G(z 2 )G{z~ 1 )G(z~ 2 ) + G(jz)G(-z 2 )G(-jz- 1 )G(-z- 2 ) 

+ G{-z)G{z 2 )G{-z~ 1 )G{z~ 2 ) + G(-jz)G(-z 2 )G{jz~ 1 )G{-z-' 2 ) 

G{z 2 )G(z~ 2 ) (G(z)G(z- 1 ) + G{-z)G(-z- 1 )) 



(b) 



2 



+ G(-z 2 )G(-z- 2 ) (G(jz)G(-jz- 1 ) + G(-jz)G(jz- 1 )) 



( = } 2 (G(z 2 )G(z- 2 ) + G(-z 2 )G(-z- 2 )) ( => 4, 



where (a) follows from the expression for the equivalent lowpass filter at level 2, 
( 19. 5a) ; in (b) we pulled out common terms G(z 2 )G(z~ 2 ) and G(—z 2 )G(—z~ 2 ); and 
(c) and (d) follow from the orthogonality of the lowpass filter g, ( 17.13) . This, 
of course, is to be expected, because we have done nothing else but concatenate 
orthogonal filter banks, which we know already implement orthonormal bases, and 
thus, must satisfy orthogonality properties. 

Deterministic Autocorrelation of the Lowpass Filter As we have done in Chap- 
ter \7\ we rephrase the above results in terms of the deterministic autocorrelation of 
the filter. This is also what we use to prove ( 19.6a) in Exercise l9.11 The deterministic 
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Lowpass Channel in a J-Level Octave-Band Orthogonal Filter Bank 



Original domain gk> (gk , 9 n _ 2 j k )n = $k 



Lowpass filter 

Original domain 

Matrix domain G (J) D 2 .j{G (J) ) T G {J) U 2 .j = I 

J-l 2 J -1 

z-domain G (J) (z) = ]J Giz 2 ') J2 G^ (W^j z)G (J) (W'/z- 1 ) = 2 J 

i=a k=0 

2J ~\ 
DTFT domain G (J) (e J ") ^ ^(W^e 3 '") = 2 J 

fc=0 
Deterministic autocorrelation 

Original domain a ( n J) = (g ( k J \ g k + n )k %j\ = h 

Matrix domain A< J > = (G (J) ) T G (J) D 2 ., A^^U^j = I 

2 J -\ 

z-domain A^(z) = G (J \z)G^ (z- 1 ) ^ A( J >(W 2 fe 7 z) = 2 J 



DTFT domain A (J) (e iu ) = |G (J) (e JC ")| ^ A {J \Wl)je^) = 2 J 

k = 

Orthogonal projection onto smooth space V^ ' = span({g 9 /, }feez) 



*v(fl =P V (-» X P v =G^U 2 jD 2 .,(G^) T 



Table 9.1: Properties of the lowpass channel in an orthogonal J-level octave-band filter 
bank. 



autocorrelation of g^ J ' is denoted by a' ' ■ 



Matrix View ,-. a(t\tt t 

D 2 jA (J) U 2 j = I 



(ti\&., k ) = a%l = 4 ^ Et^^HW^z) = 2J (9.6b) 



Orthogonal Projection Property of the Lowpass Channel We now look at the 
lowpass channel as a composition of four linear operators we just saw: 



x v(J) = P V (.nx = G {J ^U 2 .,D 2 .,{G^) T x. (9.7) 

As before, the notation is evocative of projection onto V^ J \ and we will now show 
that the lowpass channel accomplishes precisely this. Using ( 19.6aj ), we check idem- 
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potency and self-adjointness of P V (j) (Definition 1 1 . 2T[ ) . 

P 2 {J) = ^U a /D 2 j(G^f)(G^ujD 3 ,(G^f) 

( => G^U 2 .,D 2 .,{G^) T = P VW , 
P$ w = {G^U 2 .,D 2 .,(G^) T ) T = G^(U 2 .,D 2 .,f(G^) T 

® G^U 2 .,D 2 .,(G^) T = P VW , 

where (a) follows from ( 19.6a) and (b) from ( 12.190I ). Indeed, Py(j) is an orthogonal 
projection operator, with the range given in: 

W J ) = Sp5H({ff<2 2J Jfcez)- (9.8) 

The summary of properties of the lowpass channel is given in Table 19.11 

9.2.2 Bandpass Channels and Their Properties 

While we have only one lowpass filter, in a J-level octave-band filter bank leading 
to the DWT, we also have J bandpass filters, ideally, each bandlimited to ir/2 i+1 < 
\u>\ < tt/2 £ , for £ = 0, 1, . . . , J — 1, as in Figure 19.1( c) with J = 3. The analysis of 
an iterated filter bank constructed through arbitrary combinations of lowpass and 
highpass branches would follow similarly. 

The filter H^'{z) corresponds to a branch with a highpass filter followed 
by (l — 1) lowpass filters (always with upsampling by 2 in between). The (£ — 
l)th lowpass filter branch has an equivalent filter G^ 1 '(z) as in ( 19.5a) , preceded 
by upsampling by 2 e ~ 1 . Passing this upsampling across the initial highpass filter 
changes H(z) into H(z 2 ) and 

H^(z) = Hiz^'^G^-^iz), i = l,...,J, (9.9) 

follows. The basis vectors correspond to the impulse responses and the shifts given 
by the upsampling factors. These upsampling factors are 2 J for the lowpass branch 
and 2 for the bandpass branches. 

Example 9.1 (The 3-level octave-band Daubechies filter bank) We con- 
tinue Example 17.31 the Daubechies filter with two zeros at z = — 1 : 

G(z) = -ip [(1 + \/3) + (3 + V2)z~ l + (3 - V3)z~ 2 + (1 - v^z" 3 ] . (9.10) 

A 3-level octave-band filter bank has as basis sequences the impulse responses 
of: 

H (1 \z) = H{z) = -z^Gi-z- 1 ), 
H {2 \z) = G{z)H{z 2 ) 1 
H^{z) = G(z)G(z 2 )H(z 4 ), 
G^{z) = G{z)G{z 2 )G{z 4 ), 
together with shifts by multiples of 2, 4, 8 and 8, respectively (see Figure 19^9) . 
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,(i) 



■(2) 



--J-UU' 



rrrv^ 



-n i^tj^ 



UK* I t 



(a) 



(b) 



h (3} 



*mttf 



(c) 



>--' r V n 



g* 



. v 'Hji> 



(d) 



Figure 9.9: Basis sequences for a 3-level octave-band filter bank based on a Daubechies 
orthonormal length-4 filter with two zeros at z — — 1, as in ( JQ.10J ). The basis sequences 
are ftA 1 , ft.' 2 , hs 3 ' and g , together with shifts by multiples of 2, 4, 8 and 8, respectively. 



,(!) f,(!) 



J 1 ) 



■,W 



We show (a) hX J , h£L a , fc£<, h^ 6 , (b) h 



(2) 



h 



(2) 4 , (c)^ 3) ,and(d)gi 3) 



Orthogonality of an Individual Bandpass Filter Unlike in the simple two-channel 
case, now, each bandpass filter h^ ' is orthogonal to its shifts by 2 , and to all the 
other bandpass filters hP) t t ^ j, as well as to their shifts by 2K We expect these 
to hold as they hold in the basic building block, but state them nevertheless. While 
we could state together the orthogonality properties for a single level and across 
levels, we separate them for clarity. All the proofs are left for Exercise 19. 11 



(^ K 



w 



-2'k' 



Matrix View 
DTFT 






EJU 1 r ffW ( W $ e *' 



--2 e 
(9.11a) 



Orthogonality of Different Bandpass Filters Without loss of generality, let us 
assume that £ < j. We summarize the orthogonality properties of the bandpass 
filter hS' and its shift by 2 e to the bandpass filter h^' and its shift by 2K 



(h {t \ ik ,h {i \ jk ) 

\ n — 2 t k 1 n—2Jk' 



Matrix View 

ZT 

DTFT 



D#(HW) T hMu* = o 

2~2t=o nW(w£z)HM(w- t k z- 1 ) = 





EU 1 ^^' 



)H^(W- f e 



k p-jw 



(9.11b) 
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Deterministic Autocorrelation of Individual Bandpass Filters 

Matrix View ^ a(v\tt t 

< ► D 2 tA y >{J 2 i = 1 

C#U£.**> = «& =4 ^ Eto'^HW^) = 2* (9.11c) 

DTFT ^-1 A{i) {w k ( e j„ ) = 2 t 

Deterministic Crosscorrelation of Different Bandpass Filters Again without loss 
of generality, we assume that £ < j- 

MatnxView D 23 C^U 2 e = 

(h£»» h«L Vh ) = 4f = «*£, EtoO^Hw^z) = O.lld) 



DTFT 



Et=o cm (w^« 



Orthogonal Projection Property of Bandpass Channels The lowpass channel 
computes a projection onto a space of coarse sequences spanned by g( J ' and its 
shifts by 2 J . Similarly, each bandpass channel computes a projection onto a space 
of detail sequences spanned by each of fv- ' and its shifts by 2 , for t = 1, 2, . . . , J. 
That is, we have J bandpass projection operators, computing bandpass projections: 

x Wi = P w wx = H^U 2 , D 2 t(HW) T x, (9.12) 

for £ = 1, 2, . . . , J. That P^e) is an orthogonal projection operator is easy to show; 
follow the same path as for the lowpass filter. Each bandpass space is given by: 

W& = spantf/^Jfeez). (9.13) 

9.2.3 Relationship between Lowpass and Bandpass Channels 

The only conditions left to show for the lowpass impulse response and its shifts by 
2 J and all the bandpass impulse responses and their appropriate shifts to form an 
orthonormal set, is the orthogonality of the lowpass and bandpass sequences. Since 
the proofs follow the same path as before, we again leave them for Exercise 19.11 

Orthogonality of the Lowpass and Bandpass Filters 

MatnxView D 2 e(H^) T G^U 2 , = 

(&., k ,h ( ^ 2lk ) = ^ Epo G( J XW 2 \z)HW(W- k z-i) = 

^^ Etc, 1 G (J) (W$ ei»)HM {W^. e~ ju ) = 

(9.14a) 



Deterministic Crosscorrelation of the Lowpass and Bandpass Filters 

MatnxView D 2 eC^U 2 ., = 

<£U *£**> = & = ° ~ TZacMyr'z) = (9.14b) 

DTFT ^2^-1 C (J,t)( W k ie i") = 
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690 Chapter 9. Wavelet Bases on Sequences 

9.3 Orthogonal Discrete Wavelet Transform 

Following our introductory Haar example, it is now quite clear what the DWT does: 
it produces a coarse projection coefficient or- ' , together with a sequence of ever finer 
detail projection (wavelet) coefficients P^ 1 ' , ft 2 \ . . ., 0^ J \ using a J-level octave- 
band filter bank as a vehicle. As we have seen in that simple example, the original 
space is split into a sequence of subspaces, each having a spectrum half the size 
of the previous (octave-band decomposition). Such a decomposition is appropriate 
for smooth sequences with isolated discontinuities (natural images are one example 
of such signals; there is evidence that the human visual system processes visual 
information in such a manner exactly). 



9.3.1 Definition of the Orthogonal DWT 

We are now ready to formally define the orthogonal DWT: 



Definition 9.1 (Orthogonal DWT) The J-level orthogonal DWT of a se- 
quence a; is a function of £ £ {1, 2, . . . , J} given by 



a (J) - (x o (J) ) -Vj o (J) 


ke z, 




(9.15a) 


nSZ 








W - (x h W ) -Vj h W 


£e{i,2,.. 


.,J}. 


(9.15b) 


nGZ 








inverse DWT is given by 








J 

X n = Y, a k J) 9nl 2 -'k + 22 


' fc n—2 t k 




(9.15c) 


feez e=i kez 









In the above, the a^ J ' are the scaling coefficients and the j3^> are the wavelet 
coefficients. 

The equivalent filter g^ J > is often called the scaling sequence and bS> wavelets 
(wavelet sequences), £ = 1, 2, . . . , J; they are given in ( 19.5aj ) and ( 19. 9j ), respectively, 
and satisfy (gL6a|-(9jb]), ( |9Tla| - ( 193151 ) , as well as f9T4a)-{9l3H). 

To denote such a DWT pair, we write: 

x °^J J J ) B (J) B (J - 1] 8 {1) 
x n < > a. k , p k ,p k , . . . , p k . 



The orthogonal DWT is implemented using a J-level octave-band orthogonal 
filter bank as in Figure 19.11 This particular version of the DWT is called the dyadic 
DWT as each subsequent channel has half of the coefficients of the previous one. 
Various generalizations are possible; for example, Solved Exercise 19.2 considers the 
DWT obtained from a 3-channel filter bank. 
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9.3.2 Properties of the Orthogonal DWT 

Some properties of the DWT are rather obvious (such as linearity) , while others are 
more involved (such as shift in time). We now list and study a few of these. 



DWT properties 


Time domain 


DWT domain 




Linearity 


ax n + by n 


a{a (J > p {J l ...,/3 (1 h + Ha (J ' /3 (J > . . 


•>0 


Shift in time 


X n-2 J n 


k — T»o'^K — T»o fc _ 2n ' ' ^k-2 J - 1 n 




Parseval's equality 


IMI 2 = E i*» 


i 2 = \\ a (j)f + j2 11/3W11 2 






neZ 


«=i 





Table 9.2: Properties of the DWT. 

Linearity The DWT operator is a linear operator, or, 

ax n + by n 2^ a{ai%pi%...,^l} + b{al%^%...,P^}. (9.16) 

Shift in Time A shift in time by 2 j tiq results in 

T , 2^1 n- (J) tf (J) « (J - 1} tf (1) (Q-\7\ 

This property shows is that the DWT is not shift invariant; it is periodically shift 
variant with the period 2 J . 

Parseval's Equality The DWT operator is a unitary operator and thus preserves 
the Euclidean norm (see ( 11.51) ): 

IMI 2 = £M 2 = ll« w ll 2 + X> w U 2 ' ( 9 - 18 ) 

«ez e=i 

Projection After our Haar example, it should come as no surprise that a J-level 
orthogonal DWT projects the input sequence x onto one lowpass space 

y( J ) = SPm({0<2 2Jfc } fc gz), 

and J bandpass spaces 

W& = span({^ 2 , fc } feez ), 1=1,..., J, 

where g'" 7 * 1 and /jW ar e the equivalent filters given in (9.5a) and (9.9) , respectively. 
The input space £ 2 (Z) is split into the following (J + 1) spaces: 

£ 2 (Z) = v {J) ®w {J) ®w {J ~ l) ® ...®w {1) . 
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Polynomial Approximation As we have done in Section [77275] we now look at poly- 
nomial approximation properties of an orthogonal DWT J 127 | under the assumption 
that the lowpass filter g has N > 1 zeros at z = — 1, as in ( 17.46J ) : 

G(z) = (1 + z- 1 ) 1 * R(z). 

Note that R(z)\ z=1 cannot be zero because of the orthogonality constraint (7.131 ). 
Remember that the highpass filter, being a modulated version of the lowpass, has 
N zeros at z = 1. In other words, it annihilates polynomials up to degree (N — 1) 
since it takes an ./Vth-order difference of the sequence. 

In the DWT, each bandpass channel annihilates finitely-supported polynomi- 
als of a certain degree, which are therefore carried by the lowpass branch. That is, 
if x is a polynomial sequence of degree smaller than N, the channel sequences /3^' 
are all zero, and that polynomial sequence x is projected onto V^ J ' , the lowpass 
approximation space: 



E4 J) ^_ 2 , fe , o<*<iv, 



that is, the equivalent lowpass filter reproduces polynomials up to degree (N — 1). 
As this is an orthogonal DWT, the scaling coefficients follow from ( |9.15aj ), 

a k = \ x n, g n _ 2 Jk> n = \ n ' 9 n -2 J k' n = / , n 9 n ^ 7 ,J k - 

nGZ 

An example with the 4-tap Daubechies orthogonal filter from (|9.10| ) is given in 
Figure 19.101 Part (a) shows the equivalent filter after 6 levels of iteration: J = 6, 
G(6)( z ) = G{z)G{z 2 )G(z i )G{z s )G{z w )G{z 32 ) and length L^ = 190 from flOb] ), 
Part (b) shows the reproduction of a linear polynomial (over a finite range, ignoring 
boundary effects). 

In summary, the DWT, when wavelet basis sequences have zero moments, will 
have very small inner products with smooth parts of an input sequence (and exactly 
zero when the sequence is locally polynomial). This will be one key ingredient in 
building successful approximation schemes using the DWT in Chapter [131 

Characterization of Singularities Due to its localization properties, the DWT has 
a unique ability to characterize singularities. 

Consider a single nonzero sample in the input, x n = <5„_fc. This delayed 
Kronecker delta sequence now excites each equivalent filter's impulse response of 
length LW = (L— 1)(2 j — 1) + 1 at level £ (see ( 19. 5b) ), which is then downsampled 
by 2 l (see Figure 1975] for an illustration with J = 3). Thus, this single nonzero input 
creates at most (L — 1) nonzero coefficients in each channel. Furthermore, since 
each equivalent filter is of norm 1 and downsampled by 2 l , the energy resulting at 
level I is of the order 

||/3W|| 2 ~ 2- e . 



127 Recall that we are dealing with finitely-supported polynomial sequences, ignoring the boundary 
issues. If this were not the case, these sequences would not belong to any £ p space. 
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Figure 9.10: Polynomial reproduction, exact over a finite range, (a) Equivalent niter's 
impulse response after six iterations, (b) Reproduction of a linear polynomial (in red) 
over a finite range. We also show he underlying weighted basis sequences (a k g n _ 2 j k , 
k — 0, 1, . . . , 5) contributing to the reproduction of the polynomial. While the plots are 
all discrete, they give the impression of being connected due to point density. 



In other words, the energy of the Kronecker delta sequence is roughly spread across 
the channels according to a geometric distribution. Another way to phrase the 
above result is to note that as t increases, coefficients /?' ' decay roughly as 

when the input is an isolated Kronecker delta. 

For a piecewise constant sequence, the coefficients behave instead as 



a 



((■) 



2 l/2 



Thus, two different types of singularities lead to different behaviors of wavelet 
coefficients across scales. In other words, if we can observe the behavior of wavelet 
coefficients across scales, we can make an educated guess of the type of singularity 
present in the input sequence, as we illustrate in Example 19.21 We will study this 
in more in detail in the continuous-time case, in Chapter [121 

Example 9.2 (Characterization of singularities by the Haar DWT) 
Convolution of the Kronecker delta sequence at position k with the Haar analysis 
filter h_ n in ( 19.2c[ ) generates 2 £ coefficients; downsampling by 2 l then leaves a 
single coefficient of size 2 ' . 

As an example of a piecewise constant sequence, we use the Heaviside se- 
quence ( 12.101 ) delayed by k. A single wavelet coefficient will be different from zero 
at each scale, the one corresponding to the wavelet that straddles the disconti- 
nuity (Figure 19.111 ). At scale 2 , this corresponds to the wavelet with support 
from 2 e [k/2^\ to 2 e ([k/2 l \ + 1). All other wavelet coefficients are zero; on the 
left of the discontinuity because the sequence is zero, and on the right because 
the inner product is zero. The magnitude of the nonzero coefficient depends on 
the location k and varies between and 2 i ' 2 ~ 1 . When k is a, multiple of 2 e , this 
magnitude is zero, and when k is equal to I 2 l + 2 i > 2 , it achieves its maximum 
value. The latter occurs when the discontinuity is aligned with the discontinuity 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



694 



Chapter 9. Wavelet Bases on Sequences 



•i' a 



h w 

fvn 



2 -l/2 



-1/2 



(a) 



,W 



iii 



•n-i 



(b) 

Figure 9.11: Characterization of singularities by the Haar DWT. (a) A Heaviside 
sequence at location k, and (b) the equivalent wavelet sequences, highpass filter hr 
and its shifts, at scale 2 e . A single wavelet, with support from m — 2 e [k/2 e ] to 
?i2 — 2 e ([k/2 e } + 1), has a nonzero inner product with a magnitude of the order of 2 e ' 2 . 



of the "wavelet itself; then, the inner product is 2 e 1 2 " 2 
obtain 

^ 2 C«/2)-l. 



2(^/2)- 



and we 



We thus see that the magnitudes of the wavelet coefficients will vary, but 
as £ increases, they will increase at most as 2 i ' 2 . In Figure [9.12( a), we show an 
example input sequence consisting of a piecewise constant sequence and a Kro- 
necker delta sequences, and its DWT in (b). We see that the wavelet coefficients 
are gathered around the singular points (Kronecker delta, Heaviside step), and 
they decay or increase, depending on the type of singularity. 

In the example above, we obtained precisely (3^ ' = 2^ ' ' for the one nonzero 
wavelet coefficient at scale 2 £ . Figure [9.121 gives another example, with a sequence 
with more types of singularities and a DWT with longer filters. We again have 
~ 2~ e ' 2 scaling of wavelet coefficient magnitudes and a roughly constant number 
of nonzero wavelet coefficients per scale. We will study this effect in more detail in 
Chapter [131 where the bounds on the coefficient magnitudes will play a large role 
in quantifying approximation performance. 

In summary, the DWT acts as a singularity detector, that is, it leads to nonzero 
wavelet coefficients around singular points of a sequence. The number of nonzero 
coefficients per scale is bounded by (L— 1). Moreover, the magnitude of the wavelet 
coefficients across scales is an indicator of the type of singularity. Together with its 
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Figure 9.12: A piecewise constant sequence plus a Kronecker delta sequence and its 
DWT. (a) The original sequence x. (b)-(e) Wavelet coefficients ft"' at scales 2 e , £ — 
1,2,3,4. (f) Scaling coefficients a 1 •' . To compare different channels, all the sequences 
have been upsampled by a factor 2 e , £ — 1, 2, 3, 4. 



polynomial approximation properties as described previously, this ability to char- 
acterize singularities will be the other key ingredient in building successful approx- 
imation schemes using the DWT in Chapter [T3} 

Basis for £ 2 (Z) An interesting twist on a J- level DWT is what happens when we 
let the number of levels J go to infinity. This will be one way we will be building 
continuous-time wavelet bases in Chapter [12} For sequences in ^ 2 (Z), such an 
infinitely-iterated DWT can actually build an orthonormal basis based on wavelet 
sequences (equivalent highpass filters) alone. The energy of the scaling coefficients 
vanishes in £ 2 norm; in other words, the original sequence is entirely captured by 
the wavelet coefficients, thus proving Parseval's equality for such a basis. While 
this is true in general, below we prove the result for Haar filters only; the general 
proof needs additional technical conditions that are beyond the scope of our text. 
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Proof. To prove $ is an orthonormal basis, we must prove it is an orthonormal set 
and that it is complete. The orthonormality of basis functions was shown earlier in 
Section [9.11 for Haar filters and in Section [9.21 for the more general ones. 

To prove completeness, we will show that Parseval's equality holds, that is, for 
an arbitrary input x £ t (Z), we have 

00 
\\xf = £||/3«>|| 2 , (9.19) 

e=i 

where 0" 1 ' are the wavelet coefficients at scales 2 , t — 1, 2, . . .: 

M - ( x h W \ 



For any finite number of decomposition levels J, the Parseval's equality ( j9.18| ) holds. 
Thus, our task is to show limj^oo ||a' || =0. We show this by bounding two quan- 
tities: the energy lost in truncating x and the energy in the scaling coefficients that 
represent the truncated sequence. 

Without loss of generality, assume x has unit norm. For any e > 0, we will show 
that || a' '|| 2 < e for sufficiently large J. First note that there exists a K such that the 
restriction of x to { — 2 K , —2 K + 1, . . . , 0, 1, . . . , 2 K — 1} has energy at least 1 — e/2; 
this follows from the convergence of the series defining the £ 2 norm of x. Denote the 
restriction by x. 

A if -level decomposition of x has at most two nonzero scaling coefficients &_-^ 

and 5q . Each of these scaling coefficients satisfies \a k | < 1 because \\dr || 2 < 
||a;|| 2 = 1 by Bessel's inequality. We will now consider further levels of decomposition 
beyond K. After one more level, the lowpass output a^ K+1 ' has coefficients 

Similarly, after K + j total levels of decomposition we have 

4 K +;> = -L&[ K) < 4», farfc = -l,0. 

k 2 j/2 fe - 2J/ 2 

Thus, ||fi( A '+^|| 2 = (a™) 2 + (&i K +J) ) 2 < 2-<^- 1 ). 

Let J = K + j where 2" (j_1) < e/2. Then ||a (J) || 2 < e because ||5 (J) || 2 < e/2 and 
||a' J ^|| 2 cannot exceed ||a' J - ) || 2 by more than the energy e/2 excluded in the truncation 
of x. 

9.4 Biorthogonal Discrete Wavelet Transform 

We have seen that the properties of the dyadic orthogonal DWT follow from the 
properties of the orthogonal two-channel filter bank. Similarly, the properties of 
the biorthogonal DWT follow from the properties of the biorthogonal two-channel 
filter bank. Instead of fully developing the biorthogonal DWT (as it is parallel to 
the orthogonal one), we quickly summarize its salient elements. 

9.4.1 Definition of the Biorthogonal DWT 

If, instead of an orthogonal pair of highpass/lowpass filters, we use a biorthogonal 
set {h, <?, h,lj} as in ( 17.64a) -( ]7.64d[ ), we obtain two sets of equivalent filters, one for 
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Figure 9.13: Iteration of a biorthogonal pair of lowpass filters from (I9.21J ): (a) the 
iteration of g n leads to a smooth-looking sequence, while (b) the iteration of g does not. 



the synthesis side, G^ \ H^\ I = 1, 2, . . . , J, and the other for the analysis side, 
G( J ),H( i \t=l,2,...,J: 



j-i 



G^(z) = \{G{z 2 ), H^(z) = H(z 2 )G^' 1 \z), (9.20a) 

fc=0 

& J )(z) = f[G(z 2k ), H^(z) = H^-^&^Hz), (9.20b) 

k=0 

for £ = 1, . . . , J. This iterated product will play a crucial role in the construction 
of continuous-time wavelet bases in Chapter \V2[ as g^ J ' and g( J ' can exhibit quite 
different behaviors; we illustrate this with an example. 

Example 9.3 (Biorthogonal DWT) In Example \7A[ we derived a biorthog- 
onal pair with lowpass filters 



g n = [... 1 3 3 1 ...], 
g n = - [... -1 3 3 -1 



(9.21a) 
(9.21b) 



Figure 19.131 shows the first few iterations of g^ J ' and g^ J ' indicating a very dif- 
ferent behavior. Recall that both filters are lowpass filters as are their iterated 
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versions. However, the iteration of 7j does not look smooth, indicating possible 
problems as we iterate to infinity (as we will see in Chapter [12]) . 

Similarly to properties that equivalent filters satisfy in an orthogonal DWT 
(Section 19.2) , we have such properties here mimicking the biorthogonal relations 
from Section [7. 41 their formulation and proofs are left as an exercise to the reader. 



Definition 9.3 (Biorthogonal DWT) The J-level biorthogonal DWT of a 
sequence a; is a function of £ G {1, 2, . . . , J} given by 

a fc = ( r n, 9„_2Jt)in Pk = \ x n, h n _ 2ek ) n , K, Tl G Z, i G {1, 2, . . . , J j . 

(9.22a) 
The inverse DWT is given by 

fcez i=i fcez 

In the above, the cr' 7 ' are the scaling coefficients and the f3^ ' are the wavelet 
coefficients. 

The equivalent filters g( \ g( J > are often called the scaling sequences, and 
W ', ftA ', i = 1, 2, . . . , J, wavelets (wavelet sequences). 



9.4.2 Properties of the Biorthogonal DWT 

Similarly to the orthogonal DWT, the biorthogonal DWT is linear and shift varying. 
As a biorthogonal expansion, it does not satisfy Parseval's equality. However, as 
we have now access to dual bases, we can choose which one to use for projection 
(analysis) and which one for reconstruction (synthesis). This allows us to choose a 
better-suited one between g and g to induce an expansion with desired polynomial 
approximation properties and characterization of singularities. 



9.5 Wavelet Packets 

So far, the iterated decomposition was always applied to the lowpass filter, and 
often, there are good reasons to do so. However, to match a wide range of sequences, 
we can consider an arbitrary tree decomposition. In other wo rds, start with a 
sequence x and decompose it into a lowpass and a highpass version ] 128 | Then, decide 
if the lowpass, the highpass, or both, are decomposed further, and keep going until 
a given depth J. The DWT is thus one particular case when only the lowpass 
version is repeatedly decomposed. Figure 19.71 depicts some of these decomposition 
possibilities. 



'Of course, there is no reason why one could not split into TV channels initially. 
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For example, the full tree yields a linear division of the spectrum similar to 
the local Fourier transform from Chapter \8\ while the octave-band tree performs a 
J-level DWT expansion. Such arbitrary tree structures were introduced as a family 
of orthonormal bases for discrete-time sequences, and are known under the name of 
wavelet packets. The potential of wavelet packets lies in the capacity to offer a rich 
menu of orthonormal bases, from which the best one can be chosen (best according 
to a given criterion) . We discuss this in more detail in Chapter [131 

What we do here is define the basis functions and write down the appropriate 
orthogonality relations; since the proofs are in principle similar to those for the 
DWT, we chose to omit them. 

9.5.1 Definition of the Wavelet Packets 

Equivalent Channels and Their Properties Denote the equivalent filters by g lrl , 

i = 0, . . . ,2 — 1. In other words, g\ is the ith equivalent filter going through one 
of the possible paths of length £. The ordering is somewhat arbitrary, and we will 
choose the one corresponding to a full tree with a lowpass in the lower branch of 
each fork, and start numbering from the bottom. 

Example 9.4 (2-level wavelet packet equivalent filters) Let us find all 
equivalent filters at level 2, or, the filters corresponding to depth-1 and depth-2 
trees. 

Gg\z) = G (z), G ( ?\z) = G 1 {z), 

G {2 \z) = G (z) G (z 2 ), G?\z) = G (z) d(z 2 ), 

G (2 \z) = Gi(z) G Q (z 2 ), Gf\z) = Gi(z) G^z 2 ). (9.23) 

With the ordering chosen in the above equations for level 2, increasing index 
does not always correspond to increasing frequency. For ideal filters, G 2 (e JUJ ) 
chooses the range [37r/4,7r), while G\ (e ]u ) covers the range [7r/2,37r/4). Beside 
the identity basis, which corresponds to the no-split situation, we have four 
possible orthonormal bases (full 2-level split, full 1-level split, full 1-level split 
plus either lowpass or highpass split). 

Wavelet Packet Bases Among the myriad of possible bases wavelet packets gen- 
erate, one can choose that one best fitting the sequence at hand. 

Example 9.5 (2-level wavelet packet bases) Continuing Example |9.4| we 
have a family W = {$ , $1, $2, $3, $4}, where $4 is simply {J n -fc}fcez; 

<S /« (2) « (2) n {2) n {2) I 

^0 — iyo,n-2 2 fe'yi,n-2 2 fc'y2,n-2 2 fe'y3,n-2 2 fcJ fe 6 z 

corresponds to the full tree; 

A r (!) ( 2 ) ( 2 ) 1 

®1 - i5l,n-2fc.5l„_2 2 * ; '3o,«-2 2 fc/fc6Z 
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corresponds to the DWT tree; 

^2 — \yo,n-2fc>i'2,ri-2 2 A;>i'3,n-2 2 fcJ fc< = z 

corresponds to the tree with the highpass split twice; and, 

^ r (°) t 1 ) i 

*3 = W ,n-2k> 9l,n-2hik& 

corresponds to the usual two-channel filter bank basis. 
In general, we will have Fourier-like bases, given by 

*o = {9 , n _ 2 j k , ■ ■ ■ ,g 2 .,_ hn _ 2 j k }kez, (9-24) 

and wavelet-like bases, given by 

a _ r Ji) „( 2 ) JJ) n (J) \ (q ,^ 

*1 - l9l,n-2fe^l,„-2 2 fe'---'3l,„-2'fc'ffo,n-2'fe)'=eZ- (9.25) 

That these are all bases follows trivially from each building block being a basis 
(either orthonormal or biorthogonal) . 

9.5.2 Properties of the Wavelet Packets 

Exercises at the end of this chapter discuss various forms and properties of wavelet 
packets: biorthogonal wavelet packets in Exercise 19.41 and arbitrary wavelet packets 
in Exercise 19.41 

Number of Wavelet Packets How many wavelet packets (different trees) are 
there? Call N^ J ' the number of trees of depth J, then we have the recursion 

N (J) = (jy(J-i))2 -f 1} (9.26) 

since each branch of the initial two-channel filter bank can have N^^ 1 ' possible 
trees attached to it and the +1 comes from not splitting at all. As an initial 
condition, we have N^ 1 ' = 2 (either no split or a single split). It can be shown that 
the recursion leads to an order of 

N (J) ~ 2 2 '' (9.27) 

possible trees. Of course, many of these trees will be poor matches to real-life 
sequences, but an efficient search algorithm allowing to find the best match between 
a given sequence and a tree-structured expansion is possible. The proof of ( |9.26| ) is 
left as Exercise 19.31 

9.6 Computational Aspects 

We now consider the computational complexity of the DWT, and show an elemen- 
tary but astonishing result, in large part responsible for the popularity of the DWT: 
the complexity is linear in the input size. 
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Complexity of the DWT Computing a DWT amounts to computing a set of con- 
volutions but with a twist crucial to the computational efficiency of the transform; 
as the decomposition progresses down the tree (see, for example, Figure [971] ) , the 
sampling rate decreases. The implementation of J-level DWT for a length- N signal 
with a filter bank is equivalent to a factorization 



(1-2- 



l )N 





■ H (jy 




hN/4 





-ff(3)- 

G (3) .. 




In/2 





G (2) .. 




- H (iy 

_G«_ 



where H^'s and G^> are the highpass and lowpass operators, respectively, each 
with downsampling by two (see ( 12.194) and ( 12.197) for £ = 1), both sparse. 

In the DWT tree, the second level has similar cost to the first, but at half the 
sampling rate (see ( 12.268H ). Continuing this argument, the cost of the J-level DWT 
is 



L 

2" 



L 

1 



+ 



L 



>j-i 



< 2L 



in both multiplications and additions, with the cost of the order of at most 



Gdwt ~ 2 N L 



0(N), 



(9.28) 



that is, it is linear in the input size with a constant depending on the filter length. 
While the cost remains bounded, the delay does not. If the first block con- 
tributes a delay D, the second will produce a delay 2D and the fth block a delay 
2 i ~ 1 D, for a total delay of 



D 



DWT 



D + 2D + 2 Z D 



2 J - 1 D 



(2" 



1)D. 



This large delay is a serious drawback, especially for real-time applications such as 
speech coding. 

Complexity of the General Wavelet Packets What happens for more general 
trees? Clearly, the worst case is for a full tree (see Figure [9771 (e)). 

We start with a naive implementation of a 2 J -channel filter bank, downsam- 
pled by 2 J . Recall that, according to ( ]9.5bl ). the length of the equivalent filters 
(in the DWT or the full tree) are of the order 0(L 2 J ). Computing each filter and 
downsampling by 2 J leads to L operations per channel, or, for 2 J channels we obtain 



G d 



irecl 



NL2 



J 



0{NL2" 



(9.29) 



which grows exponentially with J. Exercise [9.51 compares these two implementations 
of the full tree with two Fourier-based ones, showing gains in computational cost. 

As the sampling rate goes down, the number of channels goes up, and the two 
effects cancel each other. Therefore, for J, the cost amounts to 

G fu u ~ NLJ ~ 0{NLJ), (9.30) 

multiplications or additions, again for a length- N sequence and length-L filter. 
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Chapter at a Glance 

The goal of this chapter was twofold: (1) to extend the discussion from Chapters \7\ and 
IS to more multichannel filter banks constructed as trees and associated bases; and (2) to 
consider those filter banks implementing the DWT and wavelet packets. 

While in general, tree-structured filter banks can have as their building blocks general 
iV-channel filter banks, here we concentrated mostly on those built using basic two-channel 
filter banks. Moreover, the bulk of the chapter was devoted to those tree-structured fil- 
ter banks, octave-band, where only the lowpass channel is further decomposed, as they 
implement the DWT. Such a decomposition is a natural one, with both theoretical and 
experimental evidence to support its use. Experimentally, research shows that the human 
visual system decomposes the field of view into octave bands; in parallel, theoretically, the 
DWT is an appropriate tool for the analysis of smooth sequences with isolated disconti- 
nuities. Moreover, the DWT has interesting polynomial approximation powers as well as 
the ability to characterize singularities. 

Wavelet packets extend these ideas to more general tilings of the time-frequency 
plane, adapting the decomposition to the sequence at hand. Here we discussed the decom- 
positions only; the criteria for which decomposition to use are left for later. 



DWT Filter Bank 



Block diagram: Tree structure 
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Block diagram: Multichannel structure 
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Table 9.3: DWT filter bank. 
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Historical Remarks 

The tree-structured filter banks, and, in particular, octave-band, 
or, constant-Q, ones, have been used in speech and audio. A 
similar scheme, but redundant, was proposed as a pyramid cod- 
ing technique by Burt and Adelson [21], and served as the ini- 
tial link between the discrete-time setting and the works in the 
continuous-time setting of Daubechies [39] and Mallat [99]. This 
prompted a flurry of connections between the wavelet transform, 
filter banks, and subband coding schemes, for example, the biorthogonal bases by Herley 
and Vetterii [71]. It, moreover, further opened the door to a formal treatment and defini- 
tion of the DWT as a purely discrete transform, and not only as a vehicle for implementing 
continuous-time ones. Rioul [120] rigorously defined the discrete multiresolution analysis, 
and Coifman, Meyer, Quake and Wickerhauser [31] proposed wavelet packets as an adap- 
tive tool for signal analysis. As a result of these developments and its low computational 
complexity, the DWT and its variations found their way into numerous applications and 
standards, JPEG 2000 among others. 



Further Reading 

Books and Textbooks Numerous books cover the topic of the DWT, such as those by 
Vetterii and Kovacevic [167] , Strang and Nguyen [143] and Mallat [100], among others. 

Wavelet Packets Wavelet packets were introduced in 1991 by Coifman, Meyer, Quake 
and Wickerhauser [31 j, followed by widely cited [172 ] and [173]. In 1993, Ramchandran 
and Vetterii showed for the first time a different cost measure for pruning wavelet packets, 
rate-distortion, as that suitable for compression [117], and Saito and Coifman extended 
the idea further with local discriminant bases [124[ 123J . 



Exercises with Solutions 
9.7 Introduction 

9.1. Orthogonality Relations for the Orthogonal Tree- Structured Filter Bank 

Given an orthogonal two-channel filter bank with niters g and h, prove the orthogonality 
relations (9.6a) -( [9T6b) , ([9.11a,) — ( [9".lld) , as well as ( |9. 14a| >— ( [9T 14b) , for an octave-band filter 
bank, using similar arguments as in the proof of (17.13) . 

Solution: One part of the proof is demonstrating equivalences between the time-domain 
left-hand sides, and the right-hand sides involving matrix domain, z-transform domain 
and Fourier domain. We do this only for the simple case involving single filter, and then 
proceed to prove the orthogonality relations of the equivalent filters given orthogonality 
relations of the filters g and h in an orthogonal two-channel filter bank. 

Equivalences: Demonstrating those expressions that involve single filter, such as ( [9. 6a) 
and (|9.11a) , follows the same procedure as for N = 2 in Section 2.7.51 

Instead of N = 2, in (|9.6a) we have N = 2 J . Geometrically, the left-hand side of 
( [9.6a) means that the columns of G^ J 'U 2 .j, the operator describing upsampling by 2 J 
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followed by filtering by G' J ', are orthonormal to each other, that is, 

/ = (G^U 2 .,) T (G^U 2 j) = uTj(G<. j YG (J) U 2 j = D 2 j(G<- j YG (j >U 2 j. 

As before, we can see the left-hand side of (19.6a.) as the deterministic autocorrelation of 
g( J > downsampled by 2 J . Write the deterministic autocorrelation of g' J > as in (1 2 . 1 6 1 ) ■ 

a k = (Sn i ffi _i.)n- Because of left-hand side of ( [9.6a] ), it has a single nonzero term 
modulo 2 J ,g[ J) = 1, a[ J j\ =S k . 

Since we assume a real g^ J >, in the ^-transform domain, A^ J '(z) = G^ J '(z)G^ J '(z~ 1 ) 
using (|2.142[ ), Using the orthogonality of the roots of unity fl2,277c| ), we can accomplish 
keeping only the terms modulo 2 J by adding A^ J '(W k jZ) for k = 0, 1, . . . ,2 — 1, and 

dividing by 2 . Therefore, a J, = &k can be expressed as 

2 J -1 2 J -1 

J2 A^^W^jz) = J2 G (J HW 2 k J z)G<- J \W- k z- 1 ) = 2 J , 

k = k = 

which on the unit circle leads to 

2 ' 7 - 1 , 
J2 |G (J) (W 2 fc ,e^)| = 2 J , 

fc=0 

the quadrature mirror formula generalized to N = 2 . 

The above discussion demonstrates equivalences in (|9.6a| |. (| 9.6b[ ), as well as fl9.11a| ). 
( |9.11c| | with N = 2 e (instead of AT = 2 J ). 

Orthogonality: To demonstrate orthogonality, we prove orthogonality of: the lowpass filter 
([ 9.6a[ l , an individual bandpass filter fl9.11a| ) and different bandpass filters (|9.11b[ ), as well 
as the lowpass and bandpass filters fl9.14a| ), given that (17.13[ ), ( |7.14[ ) and (7.22| ) hold. 

(i) We start with orthogonality of the lowpass filter ( 19.6a) ), which will also prove or- 
thogonality of an individual bandpass filter fl9.11a| ) (with substitution I for J). This 
is equivalent to all the terms z 2 k , k ^ 0, in A^ J '(z) having zero coefficients. We 
write A( J }(z) as: 

A( J \z) = G (J) {z)G lJ) (z- 1 ) 

( => [G^- 1 Hz)G(z 2J - 1 )][G^ J - 1 Hz- 1 )G(z- 2J - 1 )] 

= [G(^ J " 1 )G(^- 2 ' 7 " 1 )] [G (J - 1 \z)G (J - 1 Hz~ 1 )] 

= A(z 2J " 1 ) J 4< J - 1 )(z), (E9.1-1) 



where (a) follows from ( |9.5c| ). and A(z) is the deterministic autocorrelation of the 
lowpass filter g. This deterministic autocorrelation can also be expressed in the 
polyphase domain as: 

A(z) = 1 + zAiiz 2 ), (E9.1-2) 

with polyphase components Aq(z) = 1 and A-i (,z) j 129 | Expressing A( J ~^ (z) in its 
polyphase form as well: 

A( J - l Hz) = 1+ J2 ^'4 / " 1) (^ J " 1 ), (E9.1-3) 

3 = 1 



129 Because of the symmetry of A(z) in z and z 1 , we could have used either in front of the first 
polyphase component in (|E9.T2[ ); we chose z for simplicity. 
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allows us to rewrite A' '(z) from ( |E9.1-1| ) as 
A< J >(z) ( => A(z 2 ' , - 1 )A^- l \z) 



jj-l. 



i+z 2J - 1 a 1 (z 2J )+ J2 ^4 7 " 1) (^ 2J " 1 )- 






2 2J_1 A 1 ( 2 2 ' / ) £ ^^.• 7 - 1) ( Z 2 ' 7 " 1 ), (E9.1-4) 

3 = 1 

where (a) follows from ( |E9.1-1|) . and (b) from ( |E9.1-2| ). ( |E9.1-3| ). The first two terms 
in (]E9.1-4[ ) clearly have no powers of z 2 k . The third term contains terms which 
have exponents of the form: 

2 J-l_ 1 2 J-1 -1 2 J_1 -1 

3=1 3=1 3=0 



multiples of 2- 7 - 1 
where m^ £ E. The first sum is 

J2 j = (2 J - 1 -l + l)(2 J " 1 -l)/2 = 2 2J " 3 -2 J - 2 , 

3 = 1 

with no powers of z 2 fe , and the second sum contains only powers of z 2 . There- 
fore, the third term in (|E9.1-4| ) does not contain powers of z 2 k . The fourth term 
contains terms which have exponents of the form: 

2 J-l_ 1 2 J_1 -1 2 J-1 -1 

2 J - 1 +m 2 J + J2 U + m J 2 ''~ 1 ) = m 2 J + 2 J - 1 (l+ ^ mj) + J^ ■?> 
3=1 3=1 3=1 



multiples of 2- 7 - 1 

and does not contain powers of z 2 k either, 
(ii) We now show orthogonality of different bandpass filters (]9.11b[ ). Without loss of 
generality, we assumed £ < j. Now, orthogonality will hold if and only if in C"'i' (z) 
all the terms z 2 k have zero coefficients. 

The deterministic crosscorrelation C^ l ^'{z) can be written as 

C^' J ')(z) = H {l \z)H^){z- 1 ) 

= [H{z^ X )G^- y Hz)\ [Hiz-^'^G^-'Hz- 1 )] 

= [H(z 2 '- 1 )G^- 1 Hz)][H(z- 23 - 1 )G(z- 23 - 2 )...G(z- 2e - 1 )G^- 1 \z- 1 )] 

= [H(z 2e - 1 )G(z- 2e - 1 )][H(z- 23 - 1 )G(z- 23 - 2 )...G(z- 2e )][G (e - 1 Hz)G^- 1 \z- 1 ) 

= C{z 2t ~ X ) [tf (z- 2i-1 )G(z- 2i " 2 ) . . . G(z- 2 ')] AV-^iz). 

From the orthogonality of g and h, their deterministic crosscorrelation C(z) = 
G(z)H(z~ 1 ) has only one nonzero polyphase component, Ci(z), and thus 

C^\z) = [z 2e - 1 C 1 (z 2e )][H(z- 23 - 1 )G(z- 23 - 2 )...G(z- 2e )][l+J2 1 z^A<f- 1 \z 2e - 1 )]. 

3 = 1 
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After expanding C^' J '(z), the first term contains terms whose exponents are of the 
form 

2 1 ' 1 + rn 2 e + m^' 1 + m 2 2 J_2 + . . . + ra^^' 

= 2 1 - 1 + 2 e (m + rm.2'- 1 - 1 + m 2 2^ e - 2 + ... + m,.,), 

for all iriQ , ra\ , . . . . rrij _ g £ Z, which cannot be multiples of 2 ik . Similarly, the second 
term contains exponents of the form 

2 f " 1 -l 

2 1 ' 1 +m 2 e + rn 1 2 j - 1 +rn 2 2 j - 2 + ... + rn j _ e 2 e + ^ {i + 2 l ~ 1 n i ) 



,z-i_ 



,/-!_ 



= 2*" 1 (1+ J2 m)+ J2 i + 2 e (™ +rn 1 2J- e - 1 +m 2 y- e - 2 + ... + m ] _ e ), 
i=l i=l 

for all mo,mi, . . . ,nij_i,ni, . . . ,n 2 e-i_ 1 S Z, which cannot be multiples of 2 e k 
either. 
(iii) We end by showing orthogonality of lowpass and bandpass filters ( [9.14a| ). This is 
equivalent to showing that in G( J ) (z)H^' (z _1 ), all the terms z 2 have zero coeffi- 
cients. By rearranging terms, we get 

[ au-^(z)[g(z 23 - 1 )h(z- 2J - 1 )][g(z 23 )...g{z 2J - 1 )} j<J 



G (J) (z)H^>(z- 1 ) 



By substituting the expressions for A" ') (2) and G(2 2 )H( Z 2 ) from previous 
parts and expanding the terms, it is straightforward to verify that G^ J ' (z)H^' (z~ 1 ) 
contains no powers of z 2 . 

9.2. DWT from a 3-Channel Filter Bank 

A 3-channel filter bank is iterated on the Go branch to form a 2-level decomposition as 
shown in Figure |E9. 2-1 1 
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vW 



3 t>- 00 



Figure E9.2-1: A 2-level DWT with 3-channel filter banks. 



(i) Draw the equivalent 5-channel synthesis filter bank clearly specifying the transfer 

functions and sampling factors, 
(ii) Draw the time-frequency tilling of the filter bank, 
(iii) Write the basis matrix <& = {</3{. n }fc 6 z, for some 3-channel filter bank you specify 

(orthonormal or biorthogonal). 
(iv) If G are third-band ideal filters as shown in Figure IE9.2-2I draw the magnitude 

responses of the equivalent filters. 
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>/3 



Gi(e^)| 



V3 



7T 2tT~ 



G 2 (e^)\ 



(b) 
Figure E9.2-2: Third-band ideal filters. 



3 3 

(c) 



Solution: 



(i) Using the same process as in Section 9.21 the equivalent filters in a 5-channel filter 
bank are given in Figure ^E9. 2-3 1 with: 



G { i\z) = G 2 (z), G ( 2 2) (z) = G (z)G 2 (z 3 ), 

dp(z) = Gi(z), Gf\z) = G (z)G 1 (z 3 ), 

Gf ] (z) = G (z)G (z 3 ). 



ffM3|h G 2 W-G 2 (.) 



«H 3 t)H G?\z) = G 1 (z) 



A 2) ^9]y\G^(z) = G (z)G 2 (z 3 ) 



,(2) 



&"~(?l)-\G?>(z) = G°(z)Gi(z») 



a(2) HJtH^ 2) ( z ) jj G (z)Go{z T ) 




Figure E9.2-3: Equivalent 5-channel synthesis filter bank corresponding to Figure [E9.2-U 



(ii) The time-frequency tiling corresponding to Figure E9.2-3 is given in Figure E9.2-4I 
(iii) Choose the following biorthogonal filter bank: Go(z) = {l + z~ l +z~ 2 )/V3, Gi(z) = 
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Figure E9.2-4: Time-frequency tiling corresponding Figure IE9.2-U 



(1- 



+ z~ 2 )/V3, G 2 (z) = {l-z- 



2 )/\/3- The synthesis filters are then 
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and the basis matrix 



+ 2" 8 ), 










-V3 
v/3 

(iv) The magnitude responses of the equivalent filters are given in Figure E9.2-5I 

9.3. Biorthogonality Relations for the Biorthogonal DWT 

Formulate and prove the appropriate biorthogonality relations corresponding to their or- 
thogonal counterparts in Roa] )- (]9.6b| ), fl9.11a) -( [93Tc) |, as well as fl9.14a| )- (]9.14b] ). 

Solution: TBD. 

9.4. Wavelet Packets with Linear Phase Filters 

Show that in a filter bank with linear-phase filters, the iterated filters are also linear phase. 
In particular, consider a two-channel filter bank with go and g\ of even length, symmetric 
and antisymmetric, respectively. Take an equivalent 4-channel filter bank, with G (z) = 
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^(2) 



^(2) 



(2), 



Go(z)Go(z 2 ), G\"(z) = G (z)G 1 {z 2 ), G%>{z) = G 1 (z)G (z 2 ), G^' (z) = G 1 (z)G 1 (z 2 ) 
as in (19.23| ) . What are the lengths and symmetries of these four filters? 



Solution: A linear-phase filter satisfies (1 2 . 1 1 1 ) , or, in z-domain, 

H(z) = ±z- L+1 H{z- 1 ). 



(E9.4-1) 



In an iterated filter bank, filters at depth i have the form ( |9.5a| ) (where for wavelet packets, 
we have an arbitrary combination of lowpass and highpass niters) . We now show that if g 
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Figure E9.2-5: Magnitude responses of the equivalent filters corresponding to Figure [E9. 2-3 
if the filters in the 3-channel filter bank are third-band ideal. 



is linear phase, so is g( J ' (the proof would follow similarly for any other iterated filter): 



G( j \z) = Y\G{z 2 ) [ => H±(z 2 )- L+1 G(z- 2 ) 



(±l) e H(z 2 r L + 1 YlG(z- 2 ) { => (±l) e z( J - 1 ^e=o 2 G^>(z- 1 ) 



(c) 



(±1) ^(./-d(2^-i) G (j) (z -i ) a (± i/ 2 -i w +i G (fl( 2 -i), 



where (a) follows from each filter (in this case the same filter) being linear phase as in 
( |E9.4-1[ ); (b) from (19.5a[ ); (c) from the geometric series formula ( |P1.65-1[ ); and (d) from the 
expression for the total length of the iterated filter (|9.5b[ ) . This shows that the equivalent 
iterated filter is linear phase as well. 

The filters Go(z) and G\(z) are of even length, symmetric and antisymmetric, respec- 
tively. Then, Go(z 2 ) and Gi(z 2 ) are of even length as well. Since the convolution of two 
even-length filters is of odd length, we know that all of the level-2 filters are of odd length. 

Next, Gi(z 2 ) has the same symmetry properties as Gi(z) and thus: 

(i) G (z) is the product of two symmetric filters and is thus symmetric. 

(ii) G^ (z) is the product of a symmetric and an antisymmetric filter and is thus anti- 
symmetric. 

(iii) G 2 (z) is the product of an antisymmetric and a symmetric filter and is thus anti- 
symmetric. 

(iv) Gg (z) is the product of two antisymmetric filters and is thus symmetric. 

9.5. Complexity of Computing a J -Level, Full-Tree Filter Bank 

Given is a full two-channel tree decomposition with J = log 2 (A r ) levels, N = 2 J , filters of 
length L at every stage and signals of length N. In (9.30) and ( [9.29] ). we gave two different 
expressions for the complexity of the full-tree implementation. 

We now consider a couple of different options, both involving operations in the Fourier 
domain. Compute the overall complexity of the following two scenarios: First, for each, 
compute the DFT X^ of the input signal x„ using an FFT. Then: 

(i) Filter the input in the Fourier domain using 2 J equivalent filters. (While we also 
need to take the DFT of these filters, that can be done in advance.) 
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(ii) At each decomposition level of the tree, compute the Fourier transform A k of the 
filtered and downsampled signal a n as 

Ak = — ( G k X k + G k+N / 2 X k+N / 2 ) 

for < k < N/2 - 1. The output signal A k is of length N/2. 

Repeat the procedure at every stage of the wavelet tree, but each time at half 
the sampling rate (thus halving the complexity of the previous stage). 

Which of the four implementations we considered (two with time-domain filtering, ( |9.30| ) 
and (19.291 1, and two with Fourier-domain filtering, |(i)| and |(ii)| has the lowest complexity? 
Does it depend on the number of stages J, filter length L and/or signal length TV? 
Solution: The complexity of computing the FFT X k of the input signal x n is 0(N log N). 

(i) The filtering operations can be performed as multiplications in Fourier domain. This 
requires TV complex multiplications for each of the 2 J filters, because they should 
have the same length as the input signal. Next, the filtered signals have to be 
downsampled. Downsampling by 2 can be evaluated as 

A k = ^(X k +X k+N/2 ), 0<k< N/2-1, 

and requires N/2 additions. Similarly, downsampling by 2 l can be evaluated as 

2 e 
A k = iEV«/ 2 '' 0<fc<--l, 

and requires (2 e — l)N/2 £ additions. On the synthesis side, upsampling does not 
require any special operations, synthesis filtering is similar to the analysis filter- 
ing and the inverse FFT has again complexity O (TV log TV), resulting in the overall 
complexity of 

C a = NlogN + 2 J N + 2 J N -2N + 2 J N + NlogN = 2N log TV + 37V 2 . 

(ii) The computation of A k as 

A k = \{ct k X k + G k+N/2 X k+N/2 ) 0<k< N/2-1, 

requires N multiplications and 7V/2 additions. When we repeat this at the different 
decomposition levels, we need 

TV TV 
N + — + — + ... < 27V 
2 4 

operations (the synthesis side has similar complexity). The overall complexity is 

thus 

C b = 2(7Vlog7V + 27V) = 27V log TV + 47V. 

Both algorithms from (i)| (ii) have lower complexity than their counterparts from 
(19. 30 [ I, (19.29) ), Which is less complex depends on the signal and filter lengths. They have 
the same complexity for 

log TV 
47VL = 47V + 27Vlog7V =>• L = 1 + — - — . 

2 

With shorter filter lengths, the time-domain algorithm has lower complexity. With filter 

lengths longer than 1 + log7V/2, the frequency-domain algorithm is more efficient. 



Exercises 

9.8 Introduction 

9.1. Iteration of an Upsampled Interpolation Filter 

Given is the system as in Figure P9.1-1 where G(z) is the following interpolation filter 
G{z) = fz + l + fz" 1 . 
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f2 



G(z) 



\2 



G(z) 



Figure P9.1-1: System for Exercise 19.11 



f4 



G{z)G{z 2 ) — 



Figure P9.1-2: System equivalent to that of Figure IP9.1-1I 



(i) Prove that the system in Figure IP9.1-1 is equivalent to that of Figure P9.1-2I 
(ii) Iterate the system in Figure IP9.1-3 TV-times and give the equivalent filter in the 
z-transform domain. 



\2 



G(z) 



Figure P9.1-3: System equivalent to that of Figure IP9.1-H 



(iii) For N = 2, sketch the impulse response of the equivalent filter from I (ii) I and explain 
why it is called a linear interpolator. 

9.2. Equivalent Filters 

Given the expression for the equivalent filter as in (|9.5a) . prove the following recursive 
relations: 

(i) G*( J )(z) = G(z)G^-'- 1 Hz 2 ). 

(ii) G< J )(z) = G(z 2 ' , " 1 )G( J - 1 )(z). 

(iii) G^Hz) = GP'-'lWGP'-'lf/'" 1 ). 

9.3. Number of Wavelet Packets 

Prove that the number of possible wavelet packet bases generated from a depth- L binary 
tree is as given in ( |9.26) . 

9.4. Wavelet Packets 

Given is the following synthesis wavelet packet tree: 




Figure P9.4-1: An arbitrary wavelet packet tree. 
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(i) Draw the equivalent time-frequency tiling induced by such a wavelet packet tree, 
(ii) Identify the equivalent 4-channel synthesis filter bank and compute the equivalent 
filters. You may assume that the filters in each two-channel filter bank are Haar 
from Table El that is, G(z) = (1 + z _1 )/V^ and H(z) = (1 - z" 1 )/^. 
(iii) Identify the basis functions of such an expansion. Does it implement an orthonormal 
basis? Why? 

9.5. Orthogonal Full-Tree Filter Banks 
Given is a full- tree filter bank of depth 2. 

(i) Assume ideal sine filters, and give the frequency response magnitude of G\ (e JtJ ), 
i = 0, 1, 2, 3. Note that this is not the natural ordering one would expect. 

(ii) Now take the Haar filters, and give g, , i = 0, 1, 2, 3. These are the discrete-time 

Walsh-Hadamard functions of length 4. 
(iii) Given that {<?0i9i} i s an orthogonal pair, prove orthogonality for any of the equiv- 
alent filters with respect to shifts by 4. 

9.6. 6- Channel Filter Bank 

Given a filter bank specified by the following subband signal equations: 

rrt 1 ) = D 2 GD 2 Gx, 

D 2 GD 2 HD 2 Gx, 
D 2 HD 2 HD 2 Gx, 
D 2 GD 2 GD 2 Hx, 
D 2 HD 2 GD 2 Hx, 



c (3) 
r (5) 



x^ = D 2 HD 2 Hx, 
where G and H are the lowpass, highpass filter operators, respectively. 

(i) Draw the block diagram of the system using two-channel filter banks, 
(ii) Draw the equivalent time-frequency tiling induced by the 6-channel filter bank, 
(iii) Draw the equivalent single-level 6-channel filter bank clearly specifying the down- 
sampling factors and the equivalent filters in each branch. 
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Redundancy is a common tool in our daily lives; it helps remove doubt or 
uncertainty. Redundant signal representations follow the same idea to create ro- 
bustness. Given a sequence, we often represent it in another domain where its 
characteristics are more readily apparent in the expansion coefficients. If the repre- 
sentation in that other domain is achieved via a basis, corruption or loss of expansion 
coefficients can be serious. If, on the other hand, that representation is achieved 
via a redundant representation, such problems can be avoided. 

As introduced in Chapter [1] Section 1.5.41 the redundant counterpart of bases 
are called frames. Frames are the topic of the present chapter. The building blocks 
of a representation can be seen as words in a dictionary; while a basis uses a min- 
imum number of such words, a frame uses an overcomplete set. This is similar to 
multiple words with slight variations for similar concepts, allowing for very short 
sentences describing complex ideas j 130 ! While in most of the previous chapters, 
our emphasis was on finding the best expansion/representation vectors (Fourier, 
wavelet, etc.), frames allow us even more freedom; not only do we look for the best 



130 As the urban legend goes, Eskimos have a hundred words for snow. 
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expansion, we can also look for the best expansion coefficients given a fixed expan- 
sion and under desired constraints (sparsity as one example). This freedom is due 
to the fact that in a redundant dictionary, the expansion coefficients are not unique, 
while in a basis they are. 

We briefly introduced frames in M 2 in Chapter [TJ Section [1.11 We followed this 
by a more formal discussion in Section 1.5.41 Our goal in this chapter is to explore 
the potential of such overcomplete (redundant) representations in specific settings; 
in particular, we study fundamental properties of frames, both in finite dimensions 
as well as in ^ 2 (Z). We look into designing frames with some structure, especially 
those implementable by oversampled filter banks, and more specifically those with 
Fourier-like or wavelet-like time-frequency behavior. We end the chapter with the 
discussion of computational aspects related to frame expansions. 

Notation used in this chapter: We consider both real-coefficient and complex- 
coefficient frames here, unlike, for example, in Chapter [71 When Hermitian conju- 
gation is applied to polyphase matrices, it is applied only to coefficients and not to 

z. n 



10.1 Introduction 

Redundant sets of vectors look like an overcomplete basis ] 131 ! and we call such sets 
frames. Thus, a frame is an extension of a basis, where, for a given space, more 
vectors than necessary are used to obtain an expansion with desirable properties. 
In this section, we use the two frame examples from Chapter [TJ Section 1 1 . 1 L to 
introduce and discuss frame concepts in a simple setting but in more detail. For 
ease of presentation, we will repeat pertinent equations as well as figures. 



A Tight Frame for R 2 

Our first example is that from ( 11.151 ), a set of three vectors in M ~ : 



Vq 













ll 




1 




1 


V 3 


<Pi = 


t 


^2 = 


S 6 


L u _ 




L V2_ 




L V2\ 



(10.1) 



Expansion These vectors clearly span R 2 since any two of them do (see Fig- 
ure 110.1 j ). How do we represent a vector x G R 2 as a linear combination of 

\<Pi}i=0,l,2, 



2 

£ 

2 = 



Ciiipi 



(10.2) 



Let us gather these vectors into a frame matrix $, 



Even though this term is a contradiction. 
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Figure 10.1: The three vectors {<fo, fi, ¥2} from fllO.lj) form a frame for R 2 (the same 
as Figure 03(b)). 



<1> 



\<P0 <Pl V2\ 







2 1 

3 V6 
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x/2 



1 

Ve 

1 

V2. 



(10.3) 



To compute the expansion coefficients at in ( J10.2J ), we need a right inverse of $. 
Since $ is rectangular, such an inverse is not unique, so we look for the simplest one. 
As the rows of $ are orthonormal, $ $ T = I2, and thus, a possible right inverse of 
$ is just its transpose: 



<I> J 



2 

3 u 

1 1 

V6 V2 

1 _ 1 

v/6 \/2- 



' T" 

Po 

T 
Pi 

T 
<P2 



(10.4) 



Gathering the expansion coefficients into a vector a, 

a = § T x, 
and, using the fact that 

we obtain the following expansion formula: 

2 



(10.5) 



x 



^2(x, tpi)tpi, 



(10.6) 



8=0 



for all x G M 2 , which looks exactly like the usual orthonormal expansion ( jl.85aj ) 
except for the number of vectors involved. 
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Geometry of the Expansion 

orthonormal, 



Let us understand why this is so. The rows of $ are 



$$ T = I 2 . (10.7) 

Actually, $ can be seen as two rows of a unitary matrix whose third row is orthog- 
onal to the rows of <J>. We call that third row $- L , that is, $-"- = 1/Vs [l 1 l] . 
That unitary matrix is then 






l 

1 
± 



-± —± 

l 
v/3 



l 

75 

i 

7S- 



We can then write 



$($ X ) T 



hx2, 
02x1, 
-flxl 



(10.. 



(10.9a) 
(10.9b) 
(10.9c) 



Calling S the subspace of R 3 spanned by the columns of $ T , and S its orthogonal 
complement in R 3 (spanned by the one column of ($~ L ) T ), we can write 



S = span ($ 



span 



2 
3 

1 

f 
7(5- 




l 

V2- 



span(($- L ) T ) 



span 



v/3 

1 

n/3. 



We just saw that the rows of $ are orthonormal; moreover, while not of unit norm, 
the columns of $ are of the same norm, \\ipi\\ = \/2/3. Therefore, $ is a very special 
matrix, as can be guessed by looking at Figure 10.11 

Let us now understand the nature of the expansion coefficients a a bit more 
in depth. Obviously, a cannot be arbitrary; since a = # x, it belongs to the range 
of the columns of $ T , or, a 6 S. What about some arbitrary a' G R 3 ? As the 
expansion coefficients must belong to 5, we can calculate the orthogonal projection 
of a' onto S by first calculating some x' = $a' , and then computing the unique 
orthogonal projection we call a as 

a = <& T x' = $ T $a', 

where G = $ T $ is the Gram matrix from (11.1121), 



G 



$ J $ 



(a) 1 
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-1" 
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-1 
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2 



(10.10) 
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and (a) follows from ( 110.3) , ( J10.4J ). We can therefore express any a' G R 3 as 

,_L 



a = a + a 



with a € S and a 1 - G S- 1 , and thus 



(a, a x ) = 0. 



(10.11a) 
(10.11b) 



Therefore, we see that, for our $, many different a' are possible as expansion 
coefficients. In other words, <J> T is not the only possible right inverse of $. While 
this is not surprising, it allows frames to be extremely flexible, a fact we will explore 
in detail in the next section. Throughout this chapter, when we write a', we will 
mean any vector of expansion coefficients; in contrast, a will be the unique one 
obtained using the canonical (unique) dual frame. 

Energy of Expansion Coefficients For orthonormal bases, Parseval's equality (en- 
ergy conservation) ( jl.87a| ) is fundamental. To find out what happens here, we 
compute the norm of a, 



Ml s 



T ( a ) 

a a = 



x T $$ T x 



(6) T 

= X X 



INI 2 , 



(10.12) 



where (a) follows from ( 110.5) ; and (b) from ( 110.7) , again formally the same as for 
an orthonormal basis. Beware though that the comparison is not entirely fair, as 
the frame vectors are not of unit norm; we will see in a moment what happens when 
this is the case. 



Robustness to Corruption and Loss What does the redundancy of this expansion 
buy us? For example, what if the expansion coefficients get corrupted by noise? 
Assume, for instance, that a is perturbed by noise rf, where the noise components 
r)[ are uncorrelated with ||?/|| = 1. Then, reconstruction will project the noise 
rf = n + r/ 1 - , and thus cancel that part of r\ not in S: 



y 



*(o / + V) 



$77 + $77 



To compute 



-, we write 

IM 2 
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7/ T <j> T $?7 



ri r UY 1 U T ri, 



where we have performed a singular value decomposition ( 11.213) on G = $ T $ as 

i vb va 1 \fz 
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and U is a unitary matrix. So ||?7' T f7|| = 1, and thus ||£jj|| 2 = (2/3)||ry|| 2 . We have 
thus established that the energy of the noise gets reduced during reconstruction by 
the contraction factor 2/3. 
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We have just looked at the effect of noise on the reconstructed vector in a frame 
expansion. We now ask a different question: What happens if, for some reason, we 
have access to only two out of three expansion coefficients during reconstruction (for 
example, one was lost)? As in this case, any two remaining vectors form a basis, 
we will still be able to reconstruct; however, the reconstruction is now performed 
differently. For example, assume we lost the first expansion coefficient an.. To 
reconstruct, we must behave as if we had started without the first vector y>o & n d 
had computed the expansion coefficients using only 931 and if2 ■ This further means 
that to reconstruct, we must find the inverse of the 2x2 submatrix of $ T formed 
by taking its last two rows. This new reconstruction matrix $ e (where e stands for 
erasures) is 



$' 



1 


1 


~ 


>/2 


1 


1 


IT 


V2 



V3 

V'2 



V3 

'75. 



and thus, multiplying [ai 0:2] by $ e reconstructs the input vector: 



$' 



01 

0' 2 



" 


v^ 


" 


,/2 


1 


1 


12 


V2 



1 

t 

"75 



1 
'7S 



Unit-Norm Version The frame we have just seen is a very particular frame and 
intuitively close to an orthonormal basis. However, there is one difference: while 
all the frame vectors are of the same norm \/2/3, they are not of unit norm. We 
can normalize \\<Pi\\ to be of norm 1, leading to 



$ 



1 _I _I 

2 2 

n ad[ _a/I 
u 2 2 J 



& 



1 





1 

2 
1 
2 


v^5 

2 
V3 

2 



(10.13) 



yielding the expansion 



~^2(x, <Pi)<Pi, 



j=0 



and the energy in the expansion coefficients 



(10.14) 



HI 2 = flMI 2 - 



(10.15) 



The difference between ( 110.13) and (10.3) is in normalization; thus, the factor (3/2) 
appears in ( 110.15) , showing that the energy in the expansion coefficients is (3/2) 
times larger than that of the input vector. When the frame vectors are of unit 
norm as in this case, this factor represents the redundanc y, 132 } of the system — we 
have (3/2) times more vectors than needed to represent a vector in Mr. 

The frame ( 110.3) and its normalized version (10.13) are instances of the so- 
called tight frames. A tight frame has a right inverse that is its own transpose and 



132 There exists a precise quantitative definition of redundancy, see Further Reading for details. 
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conserves energy (both within a scale factor). While the tight frame vectors we 
have seen are of the same norm, in general, this is not a requirement for tightness, 
as we will see later in the chapter. 

Filter-Bank Implementation As we have seen in previous chapters, infinite-di- 
mensional expansions can be implemented using filter banks. Let us for a moment 
go back to the Haar expansion for l 2 (%) and try to draw some parallels. First, 
we have, until now, seen many times a 2 x 2 Haar matrix $, as a basis for M 2 . 
Then, in Chapter \6\ we used these Haar vectors to form a basis for € 2 (Z), by slicing 
the infinite-length sequence into pieces of length 2 and applying a Haar basis to 
each of these. The resulting basis sequences for ^ 2 (Z) are then obtained as infinite 
sequences with two nonzero elements only, shifted by integer multiples of 2, ( 11.31 ). 
( |I.5j ). Finally, in Chapter 7J ( | 7. 2 [ )- ( 17721 ). we showed how to implement such an 
orthonormal expansion for £ 2 (Z) by using a two-channel filter bank. 

We can do exactly the same here. We slice an infinite-length input sequence 
into pieces of length 2 and apply the frame we just saw to each of these. To form a 
frame for £ 2 (Z), we form three template frame vectors from the vectors ( 110.1) as: 



r() 



Vi 







l 

~~r — 




: r2 




(10.16) 



We then form all the other frame sequences as versions of ( 110.16) shifted by integer 
multiples of 2: 

$ = {<P0,n-2k, <Pl,n-2k, V2,n-2fci }fcGZ- 

To implement this frame expansion using signal processing machinery, we do exactly 
the same as we did for Haar basis in Section 17.11 we rename the template frame 
sequences tpo = go, (pi = g± and <£% = ff2- Then we can write the reconstruction 
formula as 



/ _, Oi0,k90,n-2k 

kez 



E 

kez 



"l.fcffi, 



-2k 



/ , a 2 ,k92,: 



-2k, 



with 



(10.17a) 



(10.17b) 



Oi,k = (x n , 9i.n~2k)n- 

There is really no difference between ( |10.17a[ )-( llQ.17b] ) and ( 17. 2D - ( 1773) , except that 
we have 3 template frame sequences here instead of 2 template basis sequences 
for Haar | 133 | We thus know exactly how to implement ( 110.17) : it is going to be a 
3-channel filter bank with down/upsampling by 2, as shown in Figure 10.21 with 
synthesis filters' impulse responses given by the frame vectors, and analysis filters' 
impulse responses given by the time-reversed frame vectors. 

133 Unlike for the Haar case, the on are not unique. 
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32, -n 



<i]M> 



9%n 



9l-n 



90,-r 



@— (S) 

R) — (R 



9l,n 



& 



9o, 



Figure 10.2: A 3-channel filter bank with sampling by 2 implementing a tight frame 
expansion. 



1, 




y^2 


i 



Figure 10.3: The three vectors {tpo,<pi,<f2} from ( ]10.18[) form a frame for R 2 (the same 
as Figure [L47 a)). 



A General Frame for 



Our second example is that from ( ] 1.141 ) , again a set of three vectors in 



9o 



<Pi 



92 



(10.18) 



the standard orthonormal basis {<po,<pi} plus a third vector. We follow the same 
path as we just did to spot commonalities and differences. 



Expansion Again, these vectors clearly span M 2 since any two of them do (see 
Figure 10. 3[ ) . We have already paved the path for representing a vector i € I 2 as 
a linear combination of {Vi}i=o,i,2 by introducing a matrix $ as in ( ]1.16a[ ). 



$ 



1 -1 
1 -1 



(10.19) 
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Unlike ( 110.3) , this <J> does not have orthogonal rows, and thus, $ T is not one of its 
right inverses. We call these right inverses $ T and $ a dual frame. Then 



$$ T = I 2 . 
We have seen one possible dual frame in ( ]1.16cj ) J 134 l 



$ 



-1 -1 
-1 -1 



with the associated expansion 



(10.20) 



(a) sr-^ , 

x = zJ 



X, <fi)tpi 



(>->) 



£<*> 



<Pi)<Pi, 



(10.21) 



where we expressed x both in the frame (a) and the dual frame (b). This looks 
exactly like the usual biorthogonal expansion ( ]1.105aj ), except for the number of 
vectors involved. The dual frame vectors are: 



9o 



<Pi 



92 



(10.22) 



Geometry of the Expansion In the previous example, the geometry of the expan- 
sion was captured by $ and $- L as in ( 110. 8j ); here, we must add the dual frame $ 
and its complement $ _L . Two possible complements corresponding to $ and $ are 



<h J 



l 1 l 



fc- 



[1 1 -1] 



(10.23) 



both of size 1x3. Then, the following capture the geometry of the matrices involved: 



Thus, K is spanned by both $ 



$$ T 


= 


-^2x2, 






$($ X ) T 


= 


02x1, 






q-LqT 


= 


0lx2, 






*-L($-Lf 


= 


hxl = 


1. 




$ T ®($ A 


) T 


and $ T 


©($ X ) 



(10.24a) 
(10.24b) 
(10.24c) 
(10.24d) 



Energy of Expansion Coefficients We saw how energy is conserved in a Parseval- 
like manner before, what can we say about ||a|| here? 



a 



x T a = x T $$ T x = x T UY,U T x, 



134 Note that while there exist infinitely many dual frames since $ has a nontrivial null space, 
here we concentrate on the canonical one as will be clear later in the chapter. 
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92, T. 



01,71 



90,; 



<3^K0> 






02,n 



01,71 



fa 



9o, 



Figure 10.4: A 3-channel filter bank with sampling by 2 implementing a general frame 
expansion. 



where we have performed a singular value decomposition ( 11.2131 ) on the Hermitian 
matrix $ $ T via ( |1.223a| ) as 



$$ J 



[2 


1" 


l 


1 


-1" 




[3 0" 


l 


1 l" 


1 


2 


~ V2 


1 


1 




1 


V2 


-1 1 



(10.25a) 



XJT 



Because $$ T is a Hermitian matrix, ( 11.2251 ) holds, that is, 



Ami„I < $$ J < A max 7, 



(10.25b) 



where A m j n and A max are the smallest and largest eigenvalues of $$ T . Thus, with 
3, we get 



^min — i , 'nnax 



mi 2 < n«ir 



|$ T x|| 2 < 3IHI 2 . 



(10.26) 



Therefore, although energy is not preserved, it is bounded from below and above 
by the eigenvalues of $$ . Depending on the range between the minimum and 
maximum eigenvalues, the energy can fluctuate; in general, the closer (tighter) the 
eigenvalues are, the better-behaved the frame is j 135 l The set of inequalities ( 110.26) 
is similar to how Riesz bases were defined in ( 11.80) , and a similar relation holds for 



1 



A r 



-I < $$ J < 



1 



A,, 



and thus 



^lldl 2 < ||$ T d| 2 < IHI 2 . 



This frame is related to the previous one in the same way a biorthogonal basis is 
related to an orthonormal basis, and is called a general frame. 

Filter-Bank Implementation In parallel to what we have done for ( 110.1) , we can 
use this finite-dimensional frame as an expansion for sequences in £ 2 (Z) by slicing 
the input sequence into pieces of length 2 and applying the frame we just saw to 



3 This explains the word tight in tight frames (where the eigenvalues are equal). 
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each of these. To form a frame for € 2 (Z), we form three template frame vectors 
from the three R 2 vectors ( 110. 18[ ) 



<A) 











- _ -1 















m 




Vi = 




1 


<P2 = 


-1 


























(10.27) 



To form the dual frame, we form three template vectors from the three K 2 dual 
frame vectors, ( J10.22D , leading to the frame $ and dual frame $: 

^ = {f0,n~2k, <Pl,n-2k,<P2,n-2k, jfcGZ, $ = {tf0,n-2ki <fl,n-2k, ^2,n-2fc) }fceZ- 

Renaming the template frame sequences ^i : „ = g^ n and the dual ones <pi, n = 9i,—n, 
we again have a 3-channel filter bank with down/upsampling by 2, as in Figure [[0.41 

Choosing the Frame Expansion and Expansion Coefficients 

So far, we have seen two redundant representations, a tight frame ( 110. 3[ ), akin to 
an orthonormal basis, and a general frame ( 110.191 ), akin to a biorthogonal basis. 
We showed properties, including robustness to noise and loss. Given a sequence 
x, how do we then choose an appropriate frame expansion? Moreover, as we have 
already mentioned, we can have infinitely many dual frames, and thus, infinitely 
many expansion coefficients a', which one to choose? We tackle these questions in 
the next section; here, we just show a simple example that indicates the trade-offs. 

Choosing the Frame Expansion Assume we are working in M. N and we are given 
an input sequence x consisting of a single complex sinusoidal sequence of unknown 
frequency (2ir/N)£ and a Kronecker delta sequence of unknown location k: 

Xn = p ie ^/ N ^+p 2 5 n _ k . 

As discussed in Chapter \6\ we can use a length- N DFT to expand x: we know this 
will effectively localize the sinusoid in frequency, but will do a poor job in time, and 
the location of the Kronecker delta impulse will be essentially lost. We can use the 
dual, standard basis, with the dual effect: it will do an excellent job of localizing 
the Kronecker delta impulse but will fail in localizing the frequency of the sinusoid. 
While we could use a wavelet representation from Chapter \9\ an even more 
obvious option is to use both bases at the same time, effectively creating a frame 
with the following 2A^ frame vectors: 



$ = [ DFT I] 

where the first N are the DFT basis vectors ( 12.160J ), 

1 



(10.28) 



<ri 



W" W, 



' N 



N 



w, 



i(N-l) 



N 
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while the last N are the standard basis vectors S n -i. Using ( jl.1361 ), we see that 
this is a tight frame sinc e 136 ! 



$ $* = [ DFT I] 



DFT* 
I 



21. (10.29) 



Choosing the Expansion Coefficients As x has only two components, there exists 
a way to write it as 

x = 3>a', 

where a' = &*x has exactly 2 nonzero coefficient s 137 ! 

a' = [0 ... p x ... /? 2 ... 0] T , 

where /3i is at the £th location and 02 at the (N + k)th location. Such an expansion 
is called sparse, in the sense that it uses a small number of frame vectors. This is 
different from a obtained from the canonical dual $ = (1/2)$, 

which has two dominant components at the same locations as a', but also many more 
nonzero components. We will see later that, while a' has fewer nonzero coefficients, 
a has a smaller £ 2 norm (see Solved Exercise ! 10. lj ) 1 138 | This is an important message; 
while in bases, the expansion coefficients are always unique, in frames they are not, 
and minimizing different norms will lead to different expansions. 

Chapter Outline 

This chapter is somewhat unusual in its scope. While most of the chapters in Part II 
deal either with Fourier- or wavelet-like expansions, this chapter deals with both. 
However, there is one important distinction: these expansions are all overcomplete, 
or, redundant. Thus, our decision to keep them all in one chapter. 

Unlike for bases, where we have discussed standard finite-dimensional expan- 
sions such as the DFT, we have not done so with frames until now, and thus, 
Section 10.21 investigates finite-dimensional frames. We then resume the structure 
we have been following starting with Chapter \7\ that is, we discuss the signal- 
processing vehicle for implementing frame expansions — oversampled filter banks. 
We follow with local Fourier frames in Section 10.41 and wavelet frames in Sec- 



tion [1031 Section [10.61 concludes with computational aspects. 

The sections that follow can also be seen as redundant counterparts of previous 
chapters. For example, Section 10.2 on finite-dimensional frames, has its basis 
counterpart in Chapters [T] and [2j, where we discussed finite-dimensional bases in 
general (Chapter [[} as well as some specific ones, such as the DFT (Chapter [2). 



136 Note that now we use the Hermitian transpose of $ as it contains complex entries. 
137 Although it might not be obvious how to calculate that expansion. 

norm. 
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Section [10.3l on oversampled filter banks has its basis (critically-sampled filter bank) 
counterpart in Chapter \7\ (two-channel critically-sampled filter banks) , Chapter \8\ 
(./V-channel critically-sampled filter banks) and Chapter \9\ (tree-structured critically- 
sampled filter banks) . Section 10.41 on local Fourier frames has its basis counterpart 
in Chapter \8\ (local Fourier bases on sequences), while Section 10.5 on wavelet 
frames has its basis counterpart in Chapter [9] (wavelet bases on sequences). Thus, 
this chapter also plays a unifying role in summarizing concepts on expansions of 
sequences. The two chapters that follow this one will deal with functions instead of 
sequences. 



10.2 Finite-Dimensional Frames 

We have just seen examples showing that finite-dimensional overcomplete sets of 
vectors have properties similar to orthonormal and/or biorthogonal bases. We will 
now look into general properties of such finite-dimensional frames in C N , with the 
understanding that R N is just a special case. We start with tight frames and follow 
with general frames. Since the representation in a frame is in general nonunique, 
we discuss how to compute expansion coefficients, and point out that, depending 
on which norm is minimized, different solutions are obtained. 

Finite-dimensional frames are represented via rectangular matrices; thus, all 
the material in this section is basic linear algebra. We use this simplicity to develop 
the geometric intuition to be carried to infinite-dimensional frames that follow. 



10.2.1 Tight Frames for C N 

We work in a finite-dimensional space C , where, a set of vectors $ = {<Pi} i=0 , 
M > N, is a frame represented (similarly to ( 110.31 ) and ( 110.19) ) by the frame matrix 
$ as 

* = [<Po fi ■■■ <Pm-i] NxM - (10.30) 

Assume that the rank($) = N, that is, the column range of $ is C N . Thus, any 
x € C N can be written as a nonunique linear combination of (pi's. 

We impose a further constraint and start with frames that satisfy a Parseval- 
like equality: 



Definition 10.1 (Tight frame) A family $ = {ifi}^ 1 in C N is called a tight 
frame, or, X-tight frame when there exists a constant < A < 00 called the frame 
bound, such that for all x € C , 

A/-1 



ah 2 = £l<^>l 2 = H^ll 2 - ( 10 - 31 ) 



a.3.0 [October 2011] CC by-nc-nd Comments to book-errata@FouricrAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



726 Chapter 10. Local Fourier and Wavelet Frames on Sequences 

Expansion 



Equation ( 110. 311 ) has a number of consequences. First, it means that 

$$* = XI. (10.32) 

Thus, $* is a right inverse of $. Calling $ the dual frame as before, we see that 

$ = -$. (10.33) 

A 



Then: 



x = $a, a = -<&*x, (10.34a) 

A 



M-l 

i=t^(i, (Pi)<Pi. (10.34b) 



=0 



This looks very similar to an orthonormal basis expansion, except for the scaling 
factor (1/A) and the fact that $ = {fi} i= Q cannot be a basis since <fi are not 
linearly independent. We can pull the factor (1/A) into the sum and renormalize 
the frame vectors as ip^ = (l/yX)cpi leading to the expression that formally looks 
identical to that of an orthonormal basis expansion: 



M-l 

£ 

8=0 



(x, <p'M- (10-35) 



We have already seen an example of such a renormalization in ( jlO.lj ) and ( 110.13) . A 
frame normalized so that A = 1 is called a Parseval tight frame or a 1-tight frame. 

The expression for x is what is typically called a reconstruction or a repre- 
sentation of a sequence, or, in filter banks, synthesis, while the expression for the 
expansion coefficients a is a decomposition, or, analysis in filter banks. 

In the discussion above, we said nothing about the norms of the individual 
frame vectors. Since the analysis computes inner products ai = (x, ipi), it often 
makes sense for all (pi to have the same norm, leading to an equal-norm frame (which 
may not be tight). When we combine equal norm with tightness, we get a frame 
that acts every bit like an orthonormal basis, except for the redundancy. That is, 
all inner products a* = (x, (pi) are projections of x onto vectors of the same norm, 
allowing us to compare coefficients a% to each other. Moreover, because the frame 
is tight, the right inverse is simply its adjoint (within scaling). In finite dimensions, 
tightness corresponds to the rows of $ being orthogonal. Because of this, it is hard 
in general to obtain an equal-norm frame starting from the tight one. 

Geometry of the Expansion Let us explore the geometry of tight frames. With 
the frame matrix $ as in ( J10.30) , and as we did in ( 110.81 ), we introduce 3? , 

* X = H <<* ■■ ^M-l](M-iV)xM ( 10 - 36 ) 
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as one possible orthogonal complement of $ in 









■>M. 



<PM-1 

<Pm-i 



(10.37) 



MxM 



that is, the rows of $ _L are chosen to be orthogonal to the rows of $, orthogonal to 
each other and of norm 1, or, 






[$* ($->-)*] 



$$* $($-L)* 

$-L$* $-L($-L)* 

Inxn 







l (M-N)x(M~N)_ 



Note that each vector ^ is in C , while each vector <pf- is in 
rewrite ( 110.37) as 

S = span($ T ) CC M , 
S ± = span(($ ± ) T )cC M , 



'AfxM- 



•>M-N 



and, because of ( 110.381 ), 

Inxn, 

Onx(M-N), 
I(M-N)x(M-N)- 

A vector of expansion coefficients a 1 € C M can be written as 

,,-L 



$-L($-L)* 



a = a + a 



with a € 5 and a € S , and thus 



(a, a 1 ) = 0, 



(10.38) 
We can 

(10.39a) 
(10.39b) 
(10.39c) 



(10.40a) 
(10.40b) 
(10.40c) 



(10.41a) 



(10.41b) 



as we have already seen in the simple example in the previous section, (10.11b) . 

Relation to Orthonormal Bases 

We now look into connections between tight frames and orthonormal bases. 

Theorem 10.2 A 1-tight frame with unit- norm vectors is an orthonormal basis. 

Proof. For a tight frame expansion with A = 1, 

T = $$* = I, 
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and thus, all the eigenvalues of T are equal to 1. Using one of the useful frame relations 
we introduce later in (|10.64aj) , 

N-l M-l 

» T ( a ) V^ \ W V^ II l|2 ( c ) m 

j=0 i=0 



where (a) follows from all eigenvalues being equal to 1; (b) from fll0.64a._l ); and (c) from 
all frame vectors being of unit norm. We get that M = N, and thus, our frame is an 
orthonormal basis. 

In this result, we see once more the tantalizing connection between tight frames and 
orthonormal bases. In fact, even more is true: tight frames and orthonormal bases 
arise from the minimization of the same quantity called the frame potential: 

M-l 
FP($) = ^|(^,^)| 2 . (10.42) 

i,j=0 

In fact, minimizing the frame potential has two possible outcomes: 

(i) When M < N, the minimum value of the frame potential is 

FP($) = N, 

achieved when $ is an orthonormal set. 
(ii) When M > N, the minimum value of the frame potential is 

M 2 
FP(«) = -jjt, 

achieved when $ is a unit-norm tight frame j 139 ! 

This tells us that unit-norm tight frames are a natural extension of orthonormal 
bases, that is, the theorem formalizes the intuitive notion that unit-norm tight 
frames are a generalization of orthonormal bases. Moreover, both orthonormal bases 
and unit-norm tight frames are results of the minimization of the frame potential, 
with different parameters (number of elements equal/larger than the dimension of 
the space). We give pointers to more details on this topic in Further Reading. 

Example 10.1 (Tight frames and orthonormal bases) We illustrate this 
result with an example. Fix N = 2. 

(i) We first consider the case when M = N = 2. Then, we have two vectors 
only, ipo and tp\, both on the unit circle. According to ( 110.42) , the frame 
potential is 

FP({<^i}) = M 2 +IM| 2 +2|(<A>, Vi>| 2 = 2(l + |( V o, Vi>| 2 ), 

(10.43) 
where (a) follows from ipg, ipi, being of unit norm. The above expression is 
minimized when (tpo, ipi) = 0, that is, when ipo and ipi form an orthonormal 
basis. In that case, the minimum of the frame potential is FP = 2 = N. 



139 This lower bound for frames is known as Welch bound arising when minimizing interuser 
interference in a CDMA system (see Further Reading for pointers). 
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1 ¥>2 




4t- 




(a) 



(b) 



Figure 10.5: Minimization of the frame potential for frames with unit-norm vectors, 
(a) Three unit-norm vectors in M 2 . (b) Density plot of the frame potential as a function 
of angles between frame vectors. The two minima are identical and appear for Q\ = n/3, 
8 2 - tt/3 and 0i = 2tt/3, 6 2 - 2tt/3. 



(ii) We now look at M larger than TV; we choose M = 3. Let us fix ipo = [1 0] ; 
ipi is 6\ away from ipo in counterclockwise direction; ip 2 is 9 2 away from ipi 
in counterclockwise direction (see Figure [10.5( a)). The frame potential is 
now 

FP({9 1 ,9 2 }) = llpolf+ll^f + ||^f + (10.44) 

2(|(^o, ¥>i}| 2 + |(vo, ^}| 2 + |(^i, ^ 2 }| 2 ) 
= 3 + 2(cos6>i + cos(92 + cos((9i +6 2 )). (10.45) 

Figure [TO. 5( b) shows the density plot of FP({6*i,#2j) for 9i s [0,n]. From 
the figure, we see that there are two minima, for 9\ = 6 2 = 7r/3 and 
#i = #2 = 27r/3, both of which lead to tight frames; the second choice is 
the frame we have seen in ( 110.13) , the first choice is, in fact, identical to 
the second (within reflection) . We thus see that the results of minimizing 
the frame potential in this case are tight frames, with the minimum of 



, , 7T 7T, . 

FP({-,-}) 
as per the theorem. 



FP({2-,2-}) 



111, 9 

3 + 2 (l + 4 + l)= 2 



Af2 

lv~ : 



This simple example shows that minimizing the frame potential with different 
parameters leads to either orthonormal sets (orthonormal bases when M = N) 
or unit-norm tight frames. 



Naimark's Theorem Another powerful connection between orthonormal bases and 
tight frames is also a constructive way to obtain all tight frames. It is given by the 
following theorem, due to Naimark, which says that all tight frames can be obtained 
by projecting orthonormal bases from a larger space (of dimension M) onto a smaller 
one (of dimension N) . We have seen one such example in ( 110.8) , where the frame 
$6l 2 from ( 110.3) was obtained by projecting an orthonormal basis from IR 3 . 
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Theorem 10.3 


(Naimark 


m 


Han & Larson 


m\) 


A frame $ £ C N is tight 


if and only if there exists an orthonormal basis 9 


eC M 


,M>N, such that 








$* = *[J], 




(10.46) 


where J C {0, 


1.....M- 


n 


is the index set of the retained columns of vP, a 


process known 


as seeding. 











Here, we considered only the tight-frame finite-dimensional instantiation of 
the theorem. For general finite-dimensional frames, a similar result holds, that 
is, any frame can be obtained by projecting a biorthogonal basis from a larger 
space. In Theorem 10.81 we formulate the statement for infinite-dimensional frames 
implement able by oversampled filter banks. 

Proof. Given is a tight frame $, with columns (pi, i — 0, 1, , M — 1, and rows ■i/'j, 

j — 0, 1, . . . , N — 1. Because $ is a tight frame, it satisfies ( 110,32]) ; without loss of 
generality, renormalize it by (1/vA) so that the frame we work with is 1-tight. This 
further means that 

(ipi, ipj) = 5i-j, 

that is, {ipo^ipij ■ ■ ■ )ipN—i} is an orthonormal set, and, according to ( j 10.39a) , it spans 
the subspace S C C M . The whole proof in this direction follows from the geometry of 
tight frames we discussed earlier, by showing, as we did in ( 110.371) , how to complete the 
tight frame matrix $ to obtain an orthonormal basis *$>* . 

The other direction is even easier. Assume we are given a unitary $. Choose any 
N columns of \P and call the resulting M x N matrix $*. Because these columns form 
an orthonormal set, the rows of $ form an orthonormal set, that is, 



$$* = /. 



Therefore, $ is a tight frame. 



Harmonic Tight Frames We now look into an example of a well-known family of 
tight frames called harmonic tight frames, a representative of which we have already 
seen in Q10.3J ). Harmonic tight frames are frame counterparts of the DFT, and are, 
in fact, obtained from the DFT by seeding, a process defined in Theorem 110.31 

Specifically, to obtain harmonic tight frames, we start with the DFT matrix 
^ = DFTm given in ( 12. 161a) and delete its last (M — N) columns, yielding: 



<I> 



1 1 

1 W M 
1 



W 2 M 



1 w 



(N-l) 
M 



1 

w 4 M 



w 



(W-l)-2 
M 



w 



1 

M-l 
M 
2(M-1) 

M 



w, 



w, 



(N- 1) (M-l) 



M 



(10.47a) 



with the corresponding frame vectors 



V» 



W° M 



w l M 



w 



i(N-l) 

M 



(10.47b) 
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for i = 0, 1, . . . , M — 1, where Wu = e _j27r ' M is the principal Mth root of unity 
( 12.2761 ) . The norm of each frame vector is 

ll^f = ¥>*& = N, 

the frame is Af-tight 

$$* = MI, 

and the Parseval-like equality is 

\\**x\\ 2 = M\\xf. 

In its unit- norm version, we can compute its redundancy as (M/N) . Harmonic tight 
frames have a number of other interesting properties, some of which are explored 
in Exercise 10.31 

For example, we explore an interesting property of frames that holds for har- 
monic tight frames. 



Definition 10.4 (Frame maximally robust to erasures) A frame $ is 
called maximally robust to erasures when its every N x N submatrix is invertible. 



We have seen one example of a frame maximally robust to erasures: every 
2x2 submatrix of ( 110.3) ') is invertible. The motivation in that example was that 
such frames can sustain a loss of a maximum number of expansion coefficients and 
still afford perfect reconstruction of the original vector. In fact, harmonic tight 
frames in general possess such a property since every N x N submatrix of ( ]10.47a[ ) 
is invertible (it is a Vandermonde matrix whose determinant is always nonzero, see 
(L23IJ ) and Exercise MM . 

Random Frames 

While it seems that the tight frames as those we have just seen are very special, 
it turns out that any unit-norm frame with high redundancy will be almost tight. 
This is made precise by the following result: 



Theorem 10.5 (Tightness of random frames [62]) Let {<&m}m=n be a se- 
quence of frames in R such that $m is generated by choosing M vectors inde- 
pendently with a uniform distribution on the unit sphere in 1^. Then, in the 
mean-squared sense, 

1 T 1 

_ -$$ — > —In element wise as M — > 00. 

M N 



An illustration of the theorem for N = 2 is given in Figure 10.61 
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Figure 10.6: Illustration of tightness of random frames for N = 2, and M — 
2, 3 ... , 1000. Since the convergence is elementwise, each graph plots the behavior 
Ay = [(l/Af)$$ T - (l/2)J a ]y for i,j = 0, 1. 

10.2.2 General Frames for C N 

Tight frames are attractive the same way orthonormal bases are. They obey a 
Parseval-like energy conservation equality, and the dual frame is equal to the frame 
itself (the right inverse of $ is just its own Hermitian transpose, possibly within 
scaling). Tightness, however, has sometimes to be relaxed, just as orthonormality 
does (for example, when we wanted to design two-channel linear-phase filter banks 
in Chapter [7). Either a frame is given by a specific construction, and is not tight, 
or the constraints posed by tightness are too restrictive for a desired design. 

Given a frame $ as in ( |10.30[) of rank($) = N, we can find the canonical dual 
frame $ (formalized in Definition 10. 7|) , also of size TV x M , made of dual frame 
vectors as 



$ 



V» 



($$* 

[<Po 



L $. 



9i 
\-i, 



<PM-1 



NxM ' 



<Pi 



(10.48a) 
(10.48b) 

(10.48c) 



The above are all well-defined, since T = $$* is of rank N and can thus be in- 
verted. Therefore, $ and $ play the same roles for frames as their namesakes do 
for biorthogonal bases, and, using ( |10.48aj ): 



$$* = $$* ($$*)"! = I N , 
$$* = ($$*)"!$$* = I N . 



(10.49a) 
(10.49b) 



Note that the canonical dual frame $ chosen here is a particular right inverse; we 
will formalize this in Definition 10.71 Note also that when $ is tight, with this 
definition of the dual, we indeed obtain $ = $. 
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Expansion 

We know that $ 



{fi}i=o s P an C N , that is, any x € C N can be written as 



M-l 



y^ ctiVi 



i=0 

M-l 

£ 

i=0 



««<£« 



both of which follow from (10.49) by writing 

x ( = } $$*a; = 
CO x^* 



$5, 



(10.50a) 
(10.50b) 

(10.51a) 
(10.51b) 



Oti 


= 


(X, fi) 


a = $*x, 


Q-i 


= 


(X, (fit) 


5 = $*x; 



where (a) leads to ( 110.50a! ) and (b) to (10.50b) , respectively. Again, these are 
reconstruction (representation) of a sequence, or, in filter banks, synthesis. 

We have used on and on liberally, defining them implicitly. It comes as no 
surprise that 

(10.52a) 
(10.52b) 

they are interchangeable like the expansion expressions. As before, the expression 
for a is sometimes called decomposition (or, analysis, in filter banks). 

Geometry of the Expansion Similarly to tight frames, let us explore the geometry 
of general frames. For tight frames, we dealt with $ and 3? as in (10. 301 ), ( 110. 37) ; 
here, we must add the dual frame $ and its complement $ : 



with, similarly to ( 110.38 



$ $ 


N x M 


(J)- 1 $-L 


(M-N)x M 


(110.381). 




$$* = 


Inxn, 


$(i-L)* = 


Onx(M-N), 


$-L$* = 


0(M-N)xN, 


$-L($-L)* = 


I(M-N)x(M-N) 



(10.53a) 
(10.53b) 
(10.53c) 
(10.53d) 

As in ( 110.39a) , S is the subspace of C M spanned by the columns of $*, while S^ 



is the subspace of C M spanned by the columns of ($~ 
discussing projection operators shortly, that 



span($*) 

span(($- L )*) 



span($*), 

span((^n 



' j 1 I We will see when 

(10.54a) 
(10.54b) 



140 Note that the span($*) = span(<E> T ). 
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and thus, from now on, we will use that 

S = span($*). (10.55) 

As before, an arbitrary vector of expansion coefficients a' € C can be written as 

a' = a + a 1 - (10.56a) 
with aGS and a € S , and thus 

(a, a^) = 0. (10.56b) 

Frame Operator T When calculating the dual frame, the so-called canonical dual 
of $, the product $$* is central; we call it 

T NxN = $$*. (10.57) 

It is a Hermitian and positive definite matrix (see ( jl.224) ), and thus, all of its 
eigenvalues are real and positive. According to ( 11. 223a) , T can be diagonalized as 

T = $$* = UAU*, (10.58) 

where A is a diagonal matrix of eigenvalues, and U a unitary matrix of eigenvectors. 
The largest A max and smallest A m i n eigenvalues play a special role. For tight frames, 
Amax = A m ; n = A, and T is a scaled identity, T = XI, as it possesses a single 
eigenvalue A of multiplicity N. 

Energy of Expansion Coefficients In the examples in the introduction as well as 
for tight frames earlier, we have seen how the energy of the expansion coefficients 
is conserved or bounded, (10.12) , ( 110.26) , and (10.31) , respectively. We now look 
into it for general frames by computing the energy of the expansion coefficients 5 
as 

||a|| 2 = 5*5 ( = } x*$$*x = x*UAU*x, (10.59) 

where (a) follows from ( 110.51b) and (b) from ( 110371 ). Thus, using ( jl0.25b| ), 

AminlMI 2 < ||5f < A max ||x|| 2 . (10.60a) 

Therefore, the energy, while not preserved, is bounded from below and above by the 
eigenvalues of T. How close (tight) these eigenvalues are will influence the quality 
of the frame in question, as we will see later. The same argument above can be 
repeated for ||a||, leading to 

Y^-NI 2 < IMI 2 < y^IMI 2 - (10.60b) 

Relation to Tight Frames Given a general frame $, we can easily transform it 
into a tight frame $'. We do this by diagonalizing T as in (10.58) . Then the tight 
frame is obtained as 

$' = t/A^ 1/2 L7*$. 
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Frame Operators 

The pair of inequalities fll0.60aj ) leads to an alternate definition of a frame, similar 
in spirit to Definition 11.421 for biorthogonal bases: 

Definition 10.6 (Frame) A family $ = {cpi}^ 1 in C N is called a frame when 
there exist two constants < A m j n < A max < 00, such that for all x £ C , 

M-i 

Aminllxf < ^2\(X, rf < A max ||x|| 2 , (10.61) 

8=0 

where A m i n , A max are called lower and upper frame bounds. 



Because of ( ]10.60aj ) , the frame bounds are clearly the eigenvalues of T as we 
have seen previously. From the definition, we can also understand the meaning of 
A we have seen in ( 110. 31) ; tight frames are obtained when the two frame bounds 
are equal, that is, when A max = A m i n = A. 

The operators we have seen so far are sometimes called: analysis frame oper- 
ator $*, synthesis frame operator $, and frame operator T = $$*. The analysis 
frame operator is one of many, as there exist infinitely many dual frames for a given 
frame $. In our finite-dimensional setting, the analysis frame operator maps an 
input x £ C N onto a subspace of C M , namely a = $*x belongs to the subspace S 
spanned by the columns of $* as we have seen in ( |10.55)J 141 I These, together with 
other frame operators introduced shortly, are summarized in Table 10.11 

Given x £ C , the frame operator T = $$* is a linear operator from C N to 
C*, guaranteed to be of full rank N since (10.61) ensures that A m i n > 0. Also, 



*** < A max 7. (10.62) 



On the other hand, given x € C M , the operator $*$, which we have seen 
before as a projection operator (10.10) in our simple example, maps the input onto 



141 Remember that S is spanned by either <E>* or <t>*. 
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a subspace S of C . We called that operator a Gram operator in ( 11.112) . G = $*$: 



G 



<P* 



¥?o tpi 



<Pm-i] 



9m-x. 

(lfi0, (fio) (ifio, ipx) 

(<fl, if ) ('Pi, ipi) 

(cfM-l, ¥o) (fM-1, ¥>l) 

ll^oll 2 (<P0, <Pl) 

(<Po,<Pi)* Ml 2 

,{<P0, VM-l)* {<Pl: PM-l)" 



(<Po, <Pm-i) 

(<Pl, fM-l) 

(<PM-1, <PM-l)] MxM 
■ (<f0, fM-l) 



(tpi, <p M - 



\<Pm-i\ 



G*. (10.63) 



This matrix G contains correlations between different frame vectors, and, while of 
size M x M, it is of rank N only. 

The frame operator T = $$* and the Gram operator G = $*$ have the same 
nonzero eigenvalues (see Section 1.B.2]) and thus the same trace. This fact can be 
used to show that that the sum of eigenvalues of T is equal to the sum of the norms 
of the frame vectors. We state this and three further useful frame facts here; their 
proofs are left for Exercise 110.51 



N-l 



M-l 



E A . = 


- E iiwf, 


i=o 


i=0 




M-l 


Tx -- 


= E ^' &)&> 




M-l 


(x, Tx) -- 


- Ei^^)i 2 ' 



i=0 

M-l M-l 

E^ T vi) = E i^' t^ 



(10.64a) 
(10.64b) 
(10.64c) 
(10.64d) 



Dual Frame Operators We have already discussed the dual frame operator in 
( 110. 48a) ; we now formalize it a bit more. 



Definition 10.7 (Canonical dual frame) Given a 


fr 


ame 


satisfying 




(J10.61). 


its canonical dual frame $ and dual frame vectors are: 










5 = ($$*)-!$ = T _1 $, 








(10.65a) 


& = ($$*)"V< = r-V»- 








(10.65b) 
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From (10.60b) , we can say that the dual frame $ is a frame with frame bounds 
(1/Amax) and (1/A m i n ). We also see that 



$$* = T- 1 $$* {T- 1 )* = T -1 . (10.66) 




What is particular about this canonical dual frame is that among all right 
inverses of $, $ leads to the smallest expansion coefficients a in Euclidean norm, as 
shown in Solved Exercise 110.21 We will also see later in this section, that expansion 
coefficients a' other than the coefficients a obtained from the canonical dual might 
be more appropriate when minimizing other norms (such as I 1 - or £°° norms). 

From (10.65b) , we see that to compute those dual frame vectors, we need to 
invert T. While in finite dimensions, and for reasonable M and TV, this is not a 
problem, it becomes an issue as M and N grow. In that case, the inverse can be 
computed via a series 

2 °° / 

T_1 = A ■ +A M J " A ■ +A "''> lhU,7) 

^mm ~r A max u—n \ ^min "T ^n 

which converges faster when the frame bounds X n 

when the frame is close to being tight. Solved Exercise 110.31 sketches a proof of 

( 110.67) , and Solved Exercise 110.41 illustrates it with examples. 

Projection Operators We have seen various versions of frame operators, mapping 
C N to C N , as well as the Gram operator that maps C M to C M . We now look at 
two other operators, P = $*$ and P = $*$. In fact, these are the same, as 

P = $*$ = (($$*)-!$)*$ = $*($$*)-!$ = $*5> = P. (10.68) 

Therefore, P maps C M to a subspace of C M , S, and is an orthogonal projection 



operator, as it is idempotent and self-adjoint (Definition 1.27) 
P 2 = ($*$)($*$) ( =' $*$ = P, 



P* = ($*$)* = $*$ = $*(T _1 $) = (T _1 $)*$ = $*$ = P, 

where (a) follows from ( 110. 49a) ; (b) from (10. 65a) ; and (c) from T being Hermitian 
and thus self-adjoint. This projection operator projects the input onto the column 
space of $* , or, since P and P are the same, onto the column space of $* . Table fTOTTI 
summarizes various operators we have seen until now, Table 10.2 does so for frame 
expansions, Table 10.31 summarizes various classes of frames and their properties, 
while Figure 10.71 does so pictorially. 

10.2.3 Choosing the Expansion Coefficients 

Given a frame $ and a vector x, we have seen in (10.52a) that the expansion 
coefficients are given by a = $*x; for a tight frame, this reduces to a = $*cc. 
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Operator 



Symbol Expression Size 



Synthesis frame operator <E> 

Dual (analysis) frame operator <3> 

Frame operator T 

Gram operator G 

Projection operator P 





N x M 


($$*)-!$ 


NX M 


$$* 


N x N 


$*$ 


M x M 


$*$ 


M x M 



Table 10.1: Frame operators. 



Expansion 



In* i = $a a = $*x $$* = / A lnill I < T < A max I 

In* x = 55 a = <S>*x 5**=/ (1/A max )7 < T" 1 < (l/A min )I 



Table 10.2: Frame expansions. 




Figure 10.7: Frames at a glance. Tight frames with A = 1 and unit-norm vectors lead 
to orthonormal bases. 



For frames, because $ has a nontrivial null space, there exists an infinite set 
of possible expansion coefficients (see also Solved Exercise 110.21 ). That is, given a 
frame $ and its canonical dual $ from ( |10.65a| ), from ( |10.56a| ), we can write x as 



$a' = $(a 



a 



'10.69) 



where a' is a possible vector of expansion coefficients from C M , a is its unique 
projection onto S, and a 1 - is an arbitrary vector in S . Within this infinite set 
of possible expansion coefficients, we can choose particular solutions by imposing 
further constraints on a' . Typically, this is done by minimizing a particular norm, 
some of which we discuss now. 
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Frame Constraints Properties 

General uM^o' AminlMI 2 < Eflo" K*. ^)\ 2 < A max |M| 2 

is a frame for C N A min I < T < A max / 

tr(T) = E.l-o 1 A, = tr(G) = E^" 1 llwll 3 

Equal-norm \\ip t \\ = \\ifij\\ = if A min ||x|| 2 < Eilo" 1 \( x i fi)\ 2 < A max ||z|| 2 
for all i and j A m ; n I < T < A max 7 

tr(T) = E^-o 1 A, = tr(G) = E"o ' II wf = M ^ 

Tight A min = A max = A E^oM^, ^>| 2 = A||x|| 2 

T = XI 
tr(T) = E.I-q 1 A, = NX = tr(G) = E"o * Ml 8 

Orthonormal A min = A max = 1 ^fLo 1 \( x > <£i)\ 2 = \\ x \\ 2 
basis llVill = 1 T = I 

for all i tr(T) = E^" 1 X 3 = N = tr(G) = Eflo' llVif = M 

N = M 



Table 10.3: Summary of properties for various classes of frames. 

Minimum ^ 2 -Norm Solution Among all expansion vectors a' such that $q' = x, 
the solution with the smallest £ 2 norm is 

min||a'|| 2 = ||af, (10.70) 

where a = &*x is the expansion computed with respect to the canonical dual frame. 
The proof of this fact is along the lines of what we have seen in the introduction 
for the frame ( 110.1) , since, from (10.69) , 

lla'f = \\af+\\a L \\* + 2Sl(a,ct L ) ( =' ||a|| 2 + \\a x \\ 2 , 

where (a) follows from (110.56b). The minimum is achieved for a 1 - = 0. 

Since <£■ contains sets of N linearly independent vectors (often a very large 
number of such sets), we can write x as a linear combination of N vectors from 
one such set, that is, a' will contain exactly N nonzero coefficients and will be 
spars e 142 ! . On the other hand, the minimum £ 2 -norm expansion coefficients a, using 
the canonical dual, will typically contain M nonzero coefficients. We illustrate this 
in the following example: 

Example 10.2 (Nonuniqueness of the dual frame) Take M 2 and the unit- 
norm tight frame covering the unit circle at angles (2wi)/M, for i = 0, 1, . . . , M— 



142 Remember that by sparse we mean an expansion that uses only N out of the M frame vectors. 



a3.0 [October 2011] CC by-nc-nd Comments to book-errata@FouricrAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



740 



Chapter 10. Local Fourier and Wavelet Frames on Sequences 



1, an example of which we have already seen in ( 110. 3[ ) for M 
we get 



$ 



l 
2V2 



(-1 + V/5) -|(1 + V5) 



j(l + V5) 



3. For M = 5, 



K-1 + V5) 



1 
2V2 v 



.(y/5-VE) 
The dual frame is just a scaled version of the frame itself, 



2V2 



(v/5 + \/5) 



2*. 

5 



For an arbitrary x, a = <fr*x will typically have 5 nonzero coefficients, but no 
fewer than 4 (when x is orthogonal to one of the y's). On the other hand, every 
set {(fi,ipj}, i 7^ j, is a biorthogonal basis for R 2 , meaning we can achieve an 
expansion with only 2 nonzero coefficients. Specifically, choose a biorthogonal 
basis ^ = {(pi, (fj}, calculate its dual basis ^ = {<Pi,<Pj}, and choose a' as 

(a;, (fk), k = i or k = j; 
0, otherwise. 

We have ( 2 ) = 10 possible bases; which ones are the best? As usual, those closer 
to an orthonormal basis will be better, because they are better conditioned. 
Thus, we should look for pairs {<Pi,<pj} that have an inner product \{<Pi, <fj}\ 
that is as small as possible. To do this, calculate the Gram operator ( 1 10.63J ) and 
take the absolute values of its entries \{tfi, fj)\' 



4 \/5 - 1 v/5 + 1 v/5 + 1 \/5 - 1' 

i/5-l 4 v/5-1 v^+l v^+l 

v^+l \/5-l 4 v/5-1 v^+l 

v/5 + 1 v/5 + 1 v/5-1 4 v/5-1 

.v/5-1 v/5 + 1 v/5 + 1 v/5-1 4 



and we see, as it is obvious from the geometry of the problem, that 5 bases, 
{{<P0,<Pl},{'P0,<P<i},{ t Pii<P3},{'P2><P4}>{<P3,<P4}}, have a minimum inner prod- 
uct, those with \{(fi, <pj)\ = (v5 — l)/4 ~ 0.31. Now which of these to choose? 
If we do not take into account x, it really does not matter. However, if we do 
take it into account, then it makes sense to first choose a vector ifi that is most 
aligned with x: 

max I (a;, ipi)\. 

i 

Assume ipo is chosen, that is, x is in the shaded region in Figure 10.81 Then, 
either ip\ or ip^ can be used. Let us choose an x in the shaded region, say 
x = [v/3/2 1/2] , and compute both a = $*x, as well as a' = ^*x with the 
biorthogonal basis ^ = {ipo,ipi}. Then, 

a = [0.34641 0.297258 -0.162695 -0.397809 -0.0831647] T , 
a' = [0.703566 0.525731 0] T . 
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Figure 10.8: Unit-norm tight frame in K 2 . Those x belonging to the shaded region have 
the maximum inner product (in magnitude) with tpo. One can then choose ipi or <p± as 
the other vector in a biorthogonal expansion. 



As expected, a has 5 nonzero coefficients, while a' has only 2. Then, 

|H| 2 = 0.63246 < 0.87829 = ||a'|| 2 , (10.71) 

as expected, as a has the minimum £ 2 norm. However, 

|H|i = 1.28734 > 1.22930 = ||a'||i, (10.72) 

and thus, the sparser expansion is worse with respect to the £ 2 norm, but is 
better with respect to the £ norm, illustrating the wide range of possibilities for 
expansions in frames, as well as algorithmic issues that will be explored later. 

Minimum ^-Norm Solution Instead of the I 2 norm, we can minimize the i 1 norm. 
That is, solve 



mm || a ||i 



under the constraint 



$c/ 



This can be turned into a linear program (see Section 10.6.3) . Interestingly, mini- 
mizing the I 1 norm will promote sparsity. 

Example 10.3 (Nonuniqueness of the dual frame (cont'd)) We now con- 
tinue our previous example and calculate the expansion coefficients for the 5 
biorthogonal bases <f i = Wo^i}, *04 = {<Pa,<Pi}, ^13 = {^1,^3}, *24 = 
{<P2,<Pi}, ^34 = {953, 954}- These, and their I norms are (we have already com- 
puted a 01 = a' above but repeat it here for completeness): 

Klli 



"m 


0.703566 


0.525731 


1.22930 


"04 


1.028490 


-0.525731 


1.55422 


«13 


-0.177834 


-1.138390 


1.31623 


a 24 


-1.664120 


-1.554221 


3.21834 


a'oA 


-1.028490 


0.109908 


1.13839 



So we see that even among sparse expansions with exactly 2 nonzero coefficients 
there are differences. In this particular case, ^34 has the lowest I 1 norm. 
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Minimum £°-Norm Solution The £° norm simply counts the number of nonzero 
entries in a vector: 

Ho = limY> fc |*, (10.73) 

feez 

with 0° = 0. Since a frame with M vectors in an ./V-dimensional space has necessar- 
ily a set of N linearly independent vectors, we can take these as a basis, compute 
the biorthogonal dual basis, and find an expansion a' with exactly N nonzero com- 
ponents (as we have just done in Example 1 10. 2) . Usually, there are many such sets 
(see Exercise 110.41 ), all leading to an expansion with N nonzero coefficients. Among 
these multiple solutions, we may want to choose that one with the least £ 2 norm. 
This shows that there exists a sparse expansion, very different from the expansion 
that minimizes the £ 2 norm (which will typically uses all M frame vectors and is 
thus not sparse). 

Minimum <?°°-l\lorm Solution Among possible expansion coefficients a', we can 
also chose that one that minimizes the maximum value |a$|. That is, solve 

min Halloo under the constraint 3?a = x. 

This optimization problem can be solved using TBD. While such a solution is useful 
when one wants to avoid large coefficients, minimizing the £ 2 norm achieves a similar 
goal. 

Choosing the Expansion Coefficients In summary, we have seen that the nonunique- 
ness of possible frame expansion coefficients leaves us with freedom to optimize some 
other criteria. For example, for a sparse expansion using only a few vectors from 
the frame, minimizing the £° norm is a possible route, although computationally 
difficult. Instead, minimizing the £ l norm achieves a similar goal (as we will see 
in Chapter [13] ) , and can be done with an efficient algorithm — namely, linear pro- 
gramming (see Section [10.6.3) . Minimizing the £ 2 norm does not lead to sparsity; 
instead, it promotes small coefficients, similarly to minimizing the maximum ab- 
solute value of coefficients, or the £°° norm. We illustrate this discussion with a 
simple example: 

Example 10.4 (Different norms lead to different expansions) Consider 
the simplest example, N = 1, M = 2. As a frame and its dual, choose 

$ = - [1 2] $=[12] $$* = I. 

o 

Given an input x, the subspace of all expansion coefficients a' that leads to 
x = $a' is described by 



a + a 1 ' 



J, 



since the first term is colinear with $, while the second is orthogonal to $. In 
Figure 10.91 we show a' for x = 1. It is a line of slope —1/2 passing through 
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Figure 10.9: The space of possible expansion coefficients in the frame $ = (1/5)[1 2], 
and the subspace a' — [1 2] T x + [2 — 1] T 7 for x — 1. To find the points of minimum £ x -, 
£ 2 -, and £°° norms, we grow a diamond, a circle and a square, respectively, and find the 
intercept points with the subspace a' (see also Figure 1.7 showing points with constant 
?-, £ 2 -, and £°° norms). These are D = [0 5/2], C = [1 2] and S = [5/3 5/3], respectively. 



the point [l 2], a' x = —(l/2)a' + 5/2. We can choose any point on this line 

as a possible set [a' OjJ for reconstructing x with the frame $. Recalling 
Figure 1.71 depicting points with constant I -, £ 2 -, and £°° norms, we now see 
what the solutions are to the minimization problem in different norms: 

(i) Minimum £ 2 -norm solution: The points with the same £ 2 norm form a 
circle. Thus, growing a circle from the origin to the intercept with a' yields 
the point C = [l 2] with the minimum £ 2 norm (see Figure jl0.9[ ). From 
what we know about the £ 2 norm, we could have also obtained it as the 
point on a' closest to the origin (orthogonal projection of the origin onto 
the line of possible cv'). 

(ii) Minimum £ x -norm solution: The points with the same £ x norm form a 
diamond. Thus, growing a diamond from the origin to the intercept with a' 
yields the point D = [0 5/2] with the minimum £ l norm (see Figure [TUTSI) ■ 

(iii) Minimum 1°° -norm solution: The points with the same £°° norm form a 
square. Thus, growing a square from the origin to the intercept with a' 
yields the point S = [5/3 5/3] with the minimum £°° norm (see Fig- 
ure MM- 

The table below numerically compares these three cases: 

e 1 £ 2 £°° 



D 2.50 2.50 2.50 
C 3.00 2.24 2.00 
S 3.33 2.36 1.67 



Emphasized entries are the minimum values for each respective norm. 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



744 



Chapter 10. Local Fourier and Wavelet Frames on Sequences 



3m- 1 



.'/o 



9M-1 



& 



<Q — <Q- 



.9u 



Figure 10.10: A filter-bank implementation of a frame expansion: It is an M-channel 
filter bank with sampling by N, M > N. 

10.3 Oversampled Filter Banks 

This section develops necessary conditions for the design of oversampled filter banks 
implementing frame expansions. We consider mostly those filter banks implement- 
ing tight frames, as the general ones follow easily and can be found in the literature. 
As we have done for filter banks implementing basis expansions (Chapters [73(9} we 
also look into their polyphase representation. 

From everything we have learned so far, we may expect to have an M-channel 
filter bank, where each channel corresponds to one of the template frame vectors (a 
couple of simple examples were given in Section 10.1 and illustrated in Figures 10.2 
and 110.41 ). The infinite set of frame vectors is obtained by shifting the M tem- 
plate ones by integer multiples of N, N < M; thus the redundancy of the system. 
This shifting can be modeled by the samplers in the system, as we have seen pre- 
viously. Not surprisingly thus, a general oversampled filter bank implementing a 
frame expansion is given in Figure 10.101 We now go through the salient features 
in some detail; however, since this material is a simple extension of what we have 
seen previously for bases, we will be brief. 

10.3.1 Tight Oversampled Filter Banks 

We now follow the structure of the previous section and show the filter-bank equiv- 
alent of the expansion, expansion coefficients, geometry of the expansion, as well as 
look into the polyphase decomposition as a standard analysis tool, as we have done 
in the previous chapters. 

As opposed to the previous section, we now work in an infinite-dimensional 
space, ^ 2 (Z), where formally, many things will look the same. However, we need to 
exercise care, and will point out specific instances when this is the case. Instead of a 
finite-dimensional matrix $ as in ( J10.30D , we now deal with an infinite-dimensional 
one, and with structure: the M template frame vectors, (fo, <pi, ■ ■ •, (fM-ii repeat 
themselves shifted in time, much the same way they do for bases. Renaming them 
3o = Vo, Qi = fi, ■ ■ -, 9m-i = <PM-u we get 



<I> 



90,: 



9i, 



9M-l,n 90,n-N 9l,n-N 



9M-l,n-N 



just like for critically-sampled filter banks (those with the number of channel sam- 
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Figure 10.11: A filter-bank implementation of a tight frame expansion. 



pies per unit of time conserved, that is, M = N, or, those implementing basis 
expansions), except for the larger number of template frame vectors. We could eas- 
ily implement finite-dimensional frame expansions we have seen in the last section 
by just limiting the number of nonzero coefficients in gi to N, resulting in 



<I> 







9M-1.Q 

9m-i,i 
















5o.o 






5o,i 




■ 90,N- 





i ■■■ 9m- 


-1,N-1 







5o,o 
5o,i 




9 m- i,o 
9m-i,i 














90.N-1 ■ 


■ 9m- 


-l.JV-l • 





$r 



$(. 



that is, a block-diagonal matrix, with the finite-dimensional frame matrix $o of size 
N x M on the diagonal. Recall that we concentrate on the tight-frame case, and 
therefore, $0*0 = I- 

Expansion We can express the frame expansion formally in the same way as we 
did for finite-dimensional frames in ( 110. 32j ) (again, because it is the tight-frame 
case) 



$$* 



I, 



(10.74) 



except that we will always work with 1-tight frames by normalizing $ if necessary 
by 1/vA, for the filter bank to be perfect reconstruction. Writing out the expansion, 
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however, we see its infinite-dimensional aspect: 

M-l 
X = } y yj(x, gj,n-Nk) 9i,n-Nk- (10.75) 

i=o fcez 

The process of computing the expansion coefficients is implemented via an analysis 
filter bank, filtering by individual filters g^—n, i = 0, 1, . .. , M — 1, and downsam- 
pling by 7V, as on the left side of Figure 10. Hi 

a = $*.t a ttk = (x, gi, n -Nk), (10.76) 

while the process of reconstructing x is implemented via a synthesis filter bank, 
upsampling by TV and filtering by individual filters gi. n , % = 0, 1, . . . , M — 1, as on 
the right side of Figure 10. lit 

M-l 
x = $a x = 22 'Y^ l a i,kgi,n~Nk- (10.77) 

i=o feez 

In all of the above, $ is an infinite matrix, a and x are infinite vectors. 

One can, of course, use the Fourier-domain or z-transform-domain expressions, 
as before. Since they are identical (except for the number of filters), we just give 
one as an example. For example, in z-transform-domain, we can find the expression 
of the effect of one single branch as 

1 N ^ 

k=Q 

Summing these over all branches, i = 0, 1, . . . , M — 1, we get 

M-l N-l 

X(z) = Y, G ^n £ GiiW^z-^XiWbz) 

i=0 k=0 



N-l /M-l \ 

M E E GiMGiOV^z- 1 ) X{W k N z). 



N 

k = \ 8=0 



Therefore, for perfect reconstruction, the term with X(z) must equal TV, while all 
the others (aliasing terms) must cancel, that is: 



M-l 



Y Gi^Giiz- 1 ) = N, 



i=0 
M-l 



Y Gi(z)Gi(W- k z- x ) = 0, k = 1, 2, . . . , M - 1. 

8=0 

For example, for TV = 2 and M = 3, we get that: 

G (z)Go(z' 1 ) + G 1 (z)G 1 (z~ 1 ) + G 2 (z)G 2 (z- 1 ) = 2, 
Go{z)Go{-z- 1 ) + G 1 {z)G 1 (-z~ 1 ) + G 2 {z)G 2 {-z- 1 ) = 0. 

Compare this to its counterpart expression in two-channel filter banks in ( ]7.28[ ) 
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Geometry of the Expansion Analogously to bases, each branch (ch annel) projects 
onto a subspace of £ 2 {Z) we call V n or W i; i = 1, 2, . . . , M - lj^l While each of 
these is on its own an orthogonal projection (because Py in ( 17.18[ ) is an orthogonal 
projection operator), they are not orthogonal to each other because of oversampling. 
Each of the orthogonal projection operators is given as 



Pv = GqIIn DnG q , 

P w , = G t U N D N Gf, * = 1, 2, ...,M-1, 



with the range 



Vo = span({g ,n-Affe}feez), 

Wi = spaa({g i>n - N k}k&i.), i = 1, 2, . . . ,M - 1. 

10.3.2 Polyphase View of Oversampled Filter Banks 

To cover the polyphase view for general N and M, we cover it through an example 
with N = 2, M = 3; expressions for general N and M follow easily. 

Example 10.5 (Tight oversampled 3-channel filter banks) For two-channel 
filter banks, a polyphase decomposition is achieved by simply splitting both se- 
quences and filters into their even- and odd-indexed subsequences; this is gov- 
erned by the sampling factor. In an oversampled tight filter bank with N = 2 
and M = 3, we still do the same; the difference is going to be in the number of 
filters, as before. We have already seen how to decompose an input sequence in 
( 12.210! ), synthesis filters in ( J7.32) , and analysis filters in ( |7.34| ). In our context, 
these polyphase decompositions are the same, except that for filters, we have 
more of them involved: 

ZT 



9i,0,n — 9i,2n < ► Gi.o(z) — / Qi.InZ 

neZ 

Gi,l(z) = y^ 9i,2n+l + 



ZT 
9i,l,n — 9i,2n+l < 



Gi(z) = G ij0 (z 2 ) + z^G^z 2 ), 

for i = 0, 1, 2 and synthesis filters. That is, we have 3 filters with 2 polyphase 
components each, leading to the following synthesis polyphase matrix <f> p (z): 



%(z) 



Gq${z) Gi,o(z) G 2 ,o(z) 
G ,i(z) G 1A (z) G 2 ,i(z) 



As expected, the polyphase matrix is no longer square; rather, it is a (2 x 3) 
matrix of polynomials. Similarly, on the analysis side, since this is a filter bank 
implementing a tight frame with $ = $, we assume the same filters as on the 



143 We assume here that the space Vo is lowpass in nature, while the Wi are bandpass. 
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synthesis side, only time reversed, 

9ifi,n = 9i.2n = 9i,-2n < ► Gifl(z) = / j gj.-2n.-g ™, 

nGZ 

9i,l,n = 9i,2n-l = <?i,-2n+l < > Gi,l(Z) = 2_^ 9i,-2n+lZ , 

nGZ 

Gi(z) = G ifi {z- 2 ) + zG itl {z- 2 ), 

for i = 0, 1, 2. With this definition, the analysis polyphase matrix is, similarly 
to the one for the two-channel case: 



* P (*) 



Go.o^- 1 ) Gija(z- 1 ) G 2 , (z- 1 ) 
Go.i^- 1 ) Gi,!^" 1 ) G 2il (z- 1 ) 



^M*""" 1 )- 



where 3>j,(z) is again a (2 x 3) matrix of polynomials. 

As before, this type of a representation allows for a very compact input- 
output relationship between the input (decomposed into polyphase components) 
and the result coming out of the synthesis filter bank: 



x(z) = [1 z-i)%{z 2 )®;{z- 



X (z 2 ) 
Xi{z 2 ) 



where we have again used Hermitian transpose because we will often deal with 
complex-coefficient filter banks in this chapter. The above is formally the same 
as the expression for a critically-sampled filter bank with 2 channels; the over- 
sampling is hidden in the dimensions of the rectangular matrices $ p and $ p . 
Clearly for the above to hold, $ P {z 2 ) <&* p (z~ 2 ) must be an identity, analogously 
to orthogonal filter banks. This result for tight frames is formalized in Theo- 
rem 110.81 

The above example went through various polyphase concepts for a tight oversampled 
3-channel filter bank. For general oversampled filter banks with N, M, expressions 
are the same as those given in ( |8.12c[ ). ( |8.12e[ ). except with M filters instead of N. 
The corresponding polyphase matrices are of sizes N x M each. 

Frame Operators All the frame operators we have seen so far can be expressed 
via filter bank ones as well. 

The frame operator T for a general infinite-dimensional frame is formally 
defined as for the finite-dimensional one in ( 110.571 ), except that it is now infinite- 
dimensional itself. Its polyphase counterpart is: 

T p (z) = %(z)$* p (z- 1 ). (10.80) 

For a tight frame implemented by a tight oversampled filter bank, this has to be 
an identity as we have already said in the above example. In other words, $ p is a 
rectangular paraunitary matrix. The frame operator T p (z) is positive definite on 
the unit circle: 

T p (e ju ) = |$ P (e^)| 2 > 0. (10.81) 
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The canonical dual frame operator has its polyphase counterpart in: 

$ P 0) = Tpizy^piz). (10.82) 

Again, we can see that when the frame is tight, T p (z) = I, then the dual polyphase 
matrix is the same as $ p (z). 

Polyphase Decomposition of an Oversampled Filter Bank As before, the pol- 
yphase formulation allows us to characterize classes of solutions. The following 
theorem, the counterpart of Theorem 18.1 for critically-sampled filter banks, sum- 
marizes these without proof, the pointers to which are given in Further Reading. 



Theorem 10.8 (Oversampled M-channel filter banks in polyphase domain) 
Given is an M-channel filter bank with sampling by N and the polyphase matrices 
9 p (z), %(z). Then: 

(i) Frame expansion in polyphase domain 

A filter bank implements a general frame expansion if and only if 

$p(z)$*(z) = I. (10.83a) 

A filter bank implements a tight frame expansion if and only if 

T p (z) = $ p (^)$;(z" 1 ) = /, (10.83b) 

that is, 3> p (z) is paraunitary. 
(ii) Naimark's theorem in polyphase domain 

An infinite-dimensional frame implementable via an M-channel filter bank 
with sampling by N is a general frame if and only if there exists a biorthog- 
onal basis implementable via an M-channel filter bank with sampling by M 
so that 

$;(z) = %(z)[J], (10.84) 

where J C {0, . . . , M — 1} is the index set of the retained columns of ^ p (z), 
and $ p (z), 9 p (z) are the frame/basis polyphase matrices, respectively. 

An infinite-dimensional frame implementable via an M-channel filter bank 
with sampling by N is a tight frame if and only if there exists an orthonormal 
basis implementable via an M-channel filter bank with sampling by M so 
that <mm holds @ 
(iii) Frame bounds 

The frame bounds of a frame implementable by a filter bank are given by: 

Amin = rnin T p (e J "), (10.85a) 

u£l [— 7T,7r) 

A max = max T p (e luJ ). (10.85b) 

Cl>£ [ — 7T,7t) 
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The last statement on eigenvalues stems from the fact that the frame operator 
T and its polyphase counterpart T p (e JU> ) are related via a unitary transformation. 
If the eigenvalues of T p (eJ u ) are defined via T p (e : > u ')v((jj) = X(u)v(ui), then the 
eigenvalues of T and T p (e ]bJ ) are the same, leading to ( 110. 85[ ). 

Example 10.6 (Tight oversampled 3-channel filter banks cont'd) We 
now set M = 3, N = 2 and show how one can obtain a linear-phase tight frame 
with filters of length greater than 2, a solution not possible for critically-sampled 
filter banks with sampling by 2, as was shown in Proposition 1 7.1 21 We know that 
such a filter bank implementing a tight frame transform must be seeded from an 
orthogonal filter bank with a 3 x 3 paraunitary matrix. 

We use such a matrix in the example showing how to parameterize N- 
channel orthogonal filter banks, Example 18.21 with K = 2, that is, all polyphase 
components will be first-degree polynomials in z . We form a tight frame by 
deleting its last column and call the resulting frame polyphase matrix $p (z). 
Since there are 5 angles involved, the matrix is too big to explicitly state here; 
instead, we start imposing the linear-phase conditions to reduce the number of 
degrees of freedom. A simple solution with #00 = 7r /2, #11 = tt/2, # 2 = tt/4 and 
#10 = 37r/4, leads to the first two filters being symmetric of length 3 and the last 
antisymmetric of length 3. The resulting polyphase matrix is (where we have 
rescaled the first and third columns by —1): 



**(*) 



icosMl + z- 1 ) §sm0 O i(l + z _1 ) |(1 - z~ v 
sin #oi — cos #01 



1 T 



leading to the following three filters: 

Go{z) = - cos 6»oi + sin Bqiz" 1 + -cos 9 m z~ 2 , 
G\{z) = — sin #01 — cos#oi-z~ H — sixi9oiZ , 

o.W-i-5*-- 

For example, with #01 = t/3, the three resulting filters have reasonable coverage 
of the frequency axis (see Figure 110.121 ) . 



10.4 Local Fourier Frames 

Until now, the material in this chapter covered finite-dimensional frames (Sec- 
tion 10. 2) and oversampled filter banks as a vehicle for implementing both finite- 
dimensional as well as certain infinite-dimensional frames (previous section). We 
now investigate a more specific class of frames; those obtained by modulating (shift- 
ing in frequency) a single prototype filter/frame vector, introduced in their basis 
form in Chapter [8] These are some of the oldest bases and frames, and some of the 
most widely used. The local Fourier expansions arose in response to the need to 
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Figure 10.12: Tight oversampled 3-channel filter bank with sampling by N — 2 and 
linear-phase filters. The figure depicts the three magnitude responses. 



create a local Fourier tool, able to achieve some localization in time, at the price of 
worsening the known excellent localization in frequency. 

As in Chapter [8j, we will consider two large classes of local Fourier frames, those 
obtained by complex-exponential modulation, as well as those obtained by cosine 
modulation of a single prototype filter/frame vector. In Chapter \8\ we learned 
that, while there exist no good local Fourier bases (apart from those equivalent to 
a finite-dimensional basis), there do exist good local cosine bases. In this section, 
we go even farther; we show that there exist good local Fourier frames, due to the 
extra freedom redundancy buys us. 

10.4.1 Complex Exponential-Modulated Local Fourier Frames 

Complex-exponential modulation is used in many instances, such as the DFT basis, 
( 18.2) , ( 18. 5) , as well as the basis constructed from the ideal filters ( 18. 6j h and is at the 
heart of the local Fourier expansion known as Gabor transform. The term Gabor 
frame is often used to describe any frame with complex-exponential modulation 
and overlapping frame vectors (oversampled filter banks with filters of lengths longer 
than the sampling factor N). For complex exponential- modulated bases, we defined 
this modulation in ( 18.16} ); for complex exponential- modulated frames, we do it now. 

Complex-Exponential Modulation Given a prototype filter p = go, the rest of the 
filters are obtained via complex-exponential modulation: 

9i,n = Pn e^> N ^ n = p n W^ m , (10.86) 

G«(z) = P(W* M z), 
Gi(e juJ ) = p( e i("-(2T/M)i)) = p^e*"), 

for i = 1, 2, . . . , M — 1. A filter bank implementing such a frame expansion is often 
called complex exponential-modulated oversampled filter bank. While the prototype 
filter p = go is typically real, the rest of the bandpass filters are complex. The above 
is identical to the expression for bases, (8.16) ; the difference is in the sampling factor 
N, smaller here than the number of filters M. 

Overcoming the Limitations of the Balian-Low Theorem In Chapter El Theo- 
rem 18.21 we saw that there does not exist a complex exponential-modulated local 
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Fourier basis implementable by an iV-channel FIR filter bank, except for a filter 
bank with filters of length N. We illustrated the proof with an example for N = 3 
in (8.17) and demonstrated that the only solution consisted of each polyphase com- 
ponent being a monomial, leading to a block-based expansion. 

We now investigate what happens with frames. Start with the polyphase 
representation ( ]8.12d[ ) of the prototype filter p = go, 



P(z) = P (z N ) + z- 1 P 1 (z N ) + ... + z-( N -Vp N ^(z N ), 



where Pi{z), i = 0, 1, . . . , N — 1 are its polyphase components. The modulated 
versions become 



Gi(z) = P(W l M z) 

= P (W^z N ) + 



+ W M (N - 1)l z-( N -Vp N „ 1 (Wl?z N ), 



for i = 1, 2, . . . , M — 1. On a simple example, we now show that relaxing the basis 
requirement allows us to implement a tight frame expansion via an oversampled 
filter bank with FIR filters longer than the sampling factor N. 

Example 10.7 (Overcoming limitations of Balian-Low theorem) Let 
N = 2 and M = 3. The polyphase matrix corresponding to the complex 
exponential-modulated filter bank is given by 



*p(*) 



Po(z) P (Wiz) P (W 3 z) 

P 1 {z) W%P 1 {WZz) WsP^Wsz) 



1110 
111 



Pu(z) 



PAz) 









1 














1 





1 





1 














w 3 





Wi 






(10.87) 



with P u (z) and Pe(z) the diagonal matrices of polyphase components: 

P u (z) = diag([P (z), P Q (W 3 z), PoO^)]), 
P u (z) = diag([iM*), p i (^3*), Pi (W$z)}), 

and W^ 1 = Wi, W 3 ~ 2 = W 3 . Compare (10.87) to its basis counterpart in ( T8T7J ). 
We now want to see whether it is possible for such a frame polyphase matrix 
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to implement a tight frame, in which case, it would have to satisfy ( 110.83b| ). 



^(zMi*- 1 ) 



1110 
111 



Pu(z) 



Pi(z) 



I w~ r 

W I 



Puiz- 1 ) 



Piiz- 1 





1 


0" 




1 







1 










1 







1 







1 



1110 
111 



P u (z)P u (z~ l ) W- 1 P u {z)P,,{z- 1 
W Pt^Puiz- 1 ) PtXz)P,,{z- 1 







"1 0" 




1 


) 




1 


) 




1 




1 






1 



eL p {wr Z )p {wiz- 1 ) el w i p (wrz)p 1 (wi 



(a) 



Tto p o(Wiz)Po(Wi 



)Pi(W$z) Eto Pi(W^z)P 1 (W^) 
T,Lo W 3 p o(Wlz)P 1 (W^ 



Eto Wp P {W^z- x )P x {Wiz) Eto Pi(Wiz)P 1 (W, 



I: 



where (a) follows again from W 3 



wi, w 3 



W 3 ,W = diag([l, W 3 , Wi}) 



and we assumed that p is real. It is clear that the set of conditions above is much 
less restrictive than that of every polyphase component of the prototype filter 
having to be a monomial (the condition that lead to the negative result in the 
discrete Balian-Low theorem, Theorem I8.2J ). 

For example, we see that the conditions on each polyphase component: 



Y,Po(Wlz)P (W 3 

i=0 
2 

J2Pi(Wlz)Pi(W 3 



-z- 1 ) 



-z- 1 ) 



1. 



are equivalent to those polyphase components being orthogonal filters as in ( 18. 7j ) 
On the other hand, the conditions involving both polyphase components: 



J^W^PoiW^P^Wrz' 1 ) = 0, 

2 

52Wf i H (Wrz- 1 )P l (W&) = 0, 



i=0 



are equivalent to Pq{z) and z~ 1 P\{z) being orthogonal to each other as in ( 18. 9j ). 

For example, we know that the rows of ( 1 1 . 3 J ) are orthogonal filters (since 

it is a tight frame and the rows are orthonormal vectors from a 3 x 3 unitary 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



754 Chapter 10. Local Fourier and Wavelet Frames on Sequences 




Figure 10.13: Magnitude response of the prototype filter P(z) of length 5. 



matrix via Naimark's theorem), so we can take (with normalization) 

We can now get the prototype filter P(z) as 

P(z) = Poiz^ + z^P^z 2 ) = -^{2 + V3z~ 1 -z- 2 -V3z- 3 -z- 4 ), 

3v2 



a longer solutions than N = 2, with the magnitude response as in Figure [10.131 
Another example, with N = 2 and M = 4 is left as Exercise 10.91 



Application to Power Spectral Density Estimation In Chapter \8\ Section 18.3.21 
we discussed the computation of periodograms as a widely used application of com- 
plex exponential-modulated filter banks. It is a process of estimating and computing 
the local power spectral density. That process has a natural filter-bank implementa- 
tion described in the same section. The prototype filter p computes the windowing, 
and the modulation computes the DFT (see Figure 18.91 and Table 18.11 ) . The down- 
sampling factor N can be smaller than M, which is when we have a frame. For 
example, with N = M/2, we have 50% overlap, and if N = 1 (that is, no down- 
sampling) we are computing a sliding window DFT (with (M — 1)/M% overlap). 
When both the time redundancy and the number of frequencies increases, this 
time-frequency frame approaches a continuous transform called the local Fourier 
transform, treated in detail in Chapter [111 A typical example for calculating the 
periodogram of a speech signal uses M = 64, N = 32 (or 50% overlap) and a Ham- 
ming window. No averaging of the power spectral density coefficient is used. The 
result is shown in Figure 10.141 This display is often called a spectrogram in the 
speech processing literature. From this figure, one clearly sees the time-frequency 
behavior typical of signals that have time-varying spectra. 

10.4.2 Cosine-Modulated Local Fourier Frames 

In Chapter \8\ we saw that a possible escape from the restriction imposed by the 
discrete Balian-Low theorem was to replace complex-exponential modulation with 
an appropriate cosine modulation, with an added advantage that all filters are real 
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Figure 10.14: Spectrogram of a speech segment. 64 frequency bins are evaluated between 
and 4 KHz, and a triangular window with 50% overlap is used. 



if the prototype is real. While frames in general offer another such escape, cosine- 
modulated frames provides even more options. 

Cosine Modulation Given a prototype filter p, one of the possible ways to use the 
cosine modulation is (other ways leading to different classes of cosine-modulated 
filter banks exist; see Further Reading for pointers): 



( 2w ,, 1, 

Pr, cos (i H — )n + i 

P \2M y T 



(10.88) 



1 



i'n g \ ej8 ' W 2M 



(i+l/2)n 



7 (t+l/2)n 

'iM 



Gi(z) 
Gi{e? u ) 



J0< 



P{W, 



(i+l/2) 
IM 



z) + e 



<P(W. 



-0+1/2), 



2M 



- r e i9i P(e j(w_(27r/2M)(i+1/2)) ) + e _J ' e< P(e J ' (w+(2,r/2M)( * +1/2) )l , 



for i = 0, 1, ..., M — 1, and Bi is a phase factor that gives us flexibility in designing 
the representation. Compare the above with (10.86) for the complex-exponential 
modulation; the difference is that given a real prototype filter, all the other filters 
are real. Compare it also with (8.27) for the cosine modulation in bases. The two 
expressions are identical; the difference is in the sampling factor N, smaller here 
than the number of filters M . 

Matrix View We look at a particular class of cosine-modulated frames, those with 
filters of length L = 2N , a natural extension of the LOTs from Section 18.4.1 (see 
also Further Reading). We choose the same phase factor as in ( 18.291 ). leading to 



/2* 



1 w M - 



'10.89) 



for i = 0, 1, . . . , M — 1, n = 0, 1, . . . , 2N — 1. We know that for a rectangular 
prototype window, p n = 1/VM, the above filters form a tight frame since they 
were obtained directly by seeding the LOT with the rectangular prototype window 
(compare (10.89) to (8.30) ). We follow the same analysis as we did in Section [8.4.11 
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As in ( 18.34) , we can express the frame matrix $ as 



<I> 



G 

G\ Go 

G\ Go 
G l 



(10.90) 



except that blocks G, that contain synthesis filters' impulse responses are now of 
size 27V x M (instead of 27V x TV). Given that the frame is tight (and real), 



GoGq + GiG^ 



0. 



(10.91a) 
(10.91b) 



Assume we want to impose a prototype window; then, as in ( |8.41| ), the windowed 
impulse responses are G = PoGo and G^ = P\G\, where Pq and P\ are the N x N 
diagonal matrices with the left and right tails of the prototype window p on the 
diagonal, and if the prototype window is symmetric, Pi = JPoJ. We can thus 
substitute G and G\ into ( 110.91) to verify that the resulting frame is indeed tight 

GoG n + G 1 G 1 = PoGqG Pq + PiGiG-l P\ 

= PoGoGq Pq -\- J PqJGiG^ J PqJ = /. 

Unlike for the LOTs, GqGq has no special structure now; its elements are given by 



(GqGq )i 



(*&£±ll) i sin (l(l_!i)) 



2M 



sin (E(H^l) ^sin(^l)' 



where notation t for the elements of GoGq is evocative of the frame matrix T = $<i>* 
This leads to the following conditions on the prototype window: 



tn,nP n + (1 tn,n) PjV-n— 1 



PnPk 



PN-n-lPN-k-1, 



for n = 0, 1, . . . , N— 1, k = 0, 1, . . . , N — 1, k ^= n. We can fix one coefficient; let us 
choose po = — 1, then pn-i = ±1 and pk = —pN-iPN-k-i for k = 1, 2, . . . , N — 2. 
A possible solution for the prototype window satisfying the above conditions is 

f -cos( 1 ^ T ), 7V = 2fc+l; 
dn ~ \ -cos(^), 7V = 2fc, 

for n = 0, 1, . . . , N — 1; an example window design is given in Figure 10.151 with 
coefficients as in Table 10.41 
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S? 



Figure 10.15: An example prototype window design for N — 7. 



Po 


PI 


Pi 


P3 


Pi 


P5 


P6 


-1 


-V3/2 


-1/2 





1/2 


V3/2 


1 



Table 10.4: Prototype window used in Figure[U)TT5] The prototype window is symmetric, 
so only half of the coefficients are shown. 



10.5 Wavelet Frames 

We now move from the Fourier-like frames to those that are wavelet- like. We have 
seen examples of moving from bases to frames (DFT to harmonic tight frame, for 
example, see Table 10.7ft . and we would like to do that in the wavelet case as 
well. We start with the most obvious way to generate a frame from the DWT: 
by removing some downsamplers. Then we move on to the predecessor of wavelet 
frames originating in the work of Burt and Adelson on pyramid coding, and close 
the section with the fully-redundant frames called shift-invariant DWT. 



10.5.1 Oversampled DWT 

How do we add redundancy starting from the DWT? We already mentioned that 
an obvious way to do that was to remove some downsamplers, thereby getting a 
finer time localization. Consider Figure [10.16( a), showing the sampling grid for the 
DWT (corresponding to the wavelet tiling from Figure [9771(d)): at each subsequent 
level, only half of the points are present (half of the basis functions exist at that 
scale). Ideally, we would like to, for each scale, insert additional points (one point 
between every two). This can be achieved by having a DWT tree with the samplers 
removed at all free branches (see Figure 10.17) . We call this scheme oversampled 
DWT, also known as the partial DWT (see Further Reading). The redundancy of 
this scheme at level £ is A 3 ■ = 2, for a total redundancy of A = 2. The sampling 
grid with J = 4 is depicted in Figure [10.16( b). 

Example 10.8 (Oversampled DWT) Let us now look at a simple example 
with J = 3. By moving upsamplers across filters, the filter bank in Figure [10.171 
reduces to the one in Figure 10.181 The equivalent filters are then (we leave the 
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2 4 6 



10 12 14 16 18 20 22 24 26 28 30 32 

n 



(a) 



D • □ 

□ ♦□•□•□ 

n.n.n.n.n.n.n.n 



2 4 6 8 1 1 2 1 4 1 6 1 8 20 22 24 26 28 30 32 

n 
(b) 

Figure 10.16: Sampling grids corresponding to the time-frequency tilings of (a) the 
DWT (points — nonredundant) and (b) the oversampled DWT (squares — redundant). 
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G 
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<+>••-([ 




Figure 10.17: The synthesis part of the filter bank implementing the oversampled DWT. 
The samplers are omitted at all the inputs into the bank. The analysis part is analogous. 



H(z*) 




level 3 


G 


G(z 4 ) 





H(z 2 ) 



level 2 



G> 



G{z 2 



H(z) 



level 1 



G> 



G(z) 



Figure 10.18: The synthesis part of the equivalent filter bank implementing the over- 
sampled DWT with J = 3 levels. The analysis part is analogous. 
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filter bank in its tree form as this is how it is actually implemented) | 145 | 

H {X) {z) = H(z), (10.92a) 

H {2 \z) = G(z)H(z 2 ), (10.92b) 

H^(z) = G{z)G{z 2 )H{z 4 ), (10.92c) 

G^(z) = G{z)G{z 2 )G{z 4 ), (10.92d) 

and the frame can be expressed as 

$ = {/£ fc X-2fe.^4*»^4*W ( 10 - 93 ) 

The template vector h moves by 1, hS 2 > moves by multiples of 2, and lv 3 > and 
g(3) niove by multiples of 4. Thus, the basic block of the infinite matrix is of 
size 8 x 16 (the smallest period after which it starts repeating itself, redundancy 
of 2) and it moves by multiples of 8. However, even for filters such as Haar 
for which the DWT would become a block transform (the infinite matrix <I> is 
block diagonal, see ( I9.4D ), here this is not the case. Substituting Haar filters (see 
Table \TM into the expressions for H^\ H^ 2 \ 7J (3) and G (3) above, we get 

ff (J) W = \{l + z-^-z- 2 -z-% 

H(3)( z ) = _L (l + z -l + z -2 + z -3_ z -4_ z -5_ z -6_ z - 7) 

G {3 \z) = -^(1 + z- 1 + z~ 2 + z~ 3 + z~ 4 + z- r ° + z- 6 + z- 7 ). 
2^/T 

Renaming the template frame vectors, we can rewrite the frame $ as 



fk,n 


- h (1) 

— ,i n-k^ 




k = 0, 1, ..., 7 


f8+k,n 


- h (2) 




k = 0, 1, 2, 3; 


fl2+k.n 


- /l (3) 




fc = 0, 1; 


fli+k.n 


(3) 
~~ 9n-4fe' 




fc = 0, 1; 


$ 


= {<fi,n-8k}< 


tez 


,i=0, 1,..., 15- 



(10.94) 
Compare this to the DWT example from Section 19.11 

10.5.2 Pyramid Frames 

Pyramid frames were introduced for coding in 1983 by Burt and Adelson. Although 
redundant, the pyramid coding scheme was developed for compression of images and 
was recognized in the late 1980s as one of the precursors of wavelet octave-band 
decompositions. The scheme works as follows: First, a coarse approximation a is 



145 Remember that superscript (£) denotes the level in the tree. 
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Figure 10.19: The (a) analysis and (b) synthesis part of the pyramid filter bank. This 
scheme implements a frame expansion. The dashed line indicates the actual implementa- 
tion, as in reality, the lowest branch would not be implemented; it is indicated here for 
clarity and parallelism with two-channel filter banks. 

derived (an example of how this could be done is in Figure [10- 19[ ) J 146 Then, from 
this coarse version, the original is predicted (in the figure, this is done by upsampling 
and filtering) followed by calculating the prediction error (3. If the prediction is good 
(as is the case for most natural images that have a lowpass characteristic), the error 
will have a small variance and can thus be well compressed. The process can be 
iterated on the coarse version. The outputs of the analysis filter bank are: 



a(z) 
P{z) 



(a) 



(>>) 



,1/2' 



G(z 1 / 2 )X(z 1 / 2 ) + G(-z 1 / 2 )X( 

X{z) - \G{z) \g{z)X{z) + G{-z)X{-z) 
X(z)-G{z)a(z 2 ), 



(10.95a) 



(10.95b) 



where (a) follows from ( |2.196a| ) and (b) from ( |7.77a| ). To reconstruct, we simply 
upsample and interpolate the prediction a(z) and add it back to the prediction 
error (3{z): 

G{z) a{z 2 ) + /3(z) = X(z). (10.96) 

Upsampling and interpolating is, however, only one way to obtain the prediction 
back at full resolution; any appropriate operator (even a nonlinear one) could have 
been simply inverted by subtraction. We can also see that in the figure, the re- 
dundancy of the system is 50%; a is at half resolution while [3 is at full resolution, 
that is, after analysis, we have 50% more samples than we started with. With the 
analysis given in Figure [10.19( a), we now have several options: 



146 While in the figure the intensity of the coarse approximation a is obtained by linear filtering 
and downsampling, this need not be so; in fact, one of the powerful features of the original scheme 
is that any operator can be used, not necessarily linear. 
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<2> 



(a) 



(b) 



Figure 10.20: The pyramid filter bank implementing a basis expansion. With {g, g, h, h} 
a biorthogonal set, the scheme implements a biorthogonal basis expansion, while with g 
and <? orthogonal, that is, g„ = g- n and g satisfies ( j7.13| ), the scheme implements an 
orthonormal basis expansion. The output (3 from Figure 10.19( a) goes through (a) filtering 
and downsampling creating a new output /?'. (b) Synthesis part. 



• Synthesis is performed by upsampling and interpolating a by g as in Fig- 
ure [nrT9](b). In this case, the resulting scheme is clearly redundant, as we 
have just discussed, and implements a frame expansion, which can be either: 



(i) general, when filters g and g are biorthogonal (they satisfy ( 17.66D ) , or, 
(ii) tight, when filters g and g are orthogonal, that is, g n = g~ n and g satisfies 
( 17.13) . We illustrate this case in Example 10.91 



• The analysis goes through one more stage, as in Figure [l0. 20( a), and synthesis 
is performed as in Figure [l0. 20( b). In this case, the scheme implements a basis 
expansion, which can be either (both are illustrated in Exercise 10.11) : 

(i) biorthogonal, when filters g and g are biorthogonal, or, 
(ii) orthonormal, when filters g and g are orthogonal. 

Example 10.9 We use the pyramid filter bank as in Figure |10.19l Let us assume 
that g is the Haar lowpass filter from ( 17. la) and that g n = g~ n . Then we know 
from Chapter \7\ that (3 is nothing else but the output of the highpass branch, 
given in ( II. 9b) . For every two input samples, while a produces one output sample, 
(3 produces two output samples; thus, the redundancy. We can write this as: 



Oin 

fan 
/?2n+l 



r 1 

V2 


1 -l 


1 
- 2 


I 
2- 



X2n 

%2n+l 



<I,J 



We know, however, from our previous discussion that the above matrix is the 
dual frame matrix $ T . Finding its canonical dual, we get that $ = $, and thus, 
this pyramid scheme implements a tight frame expansion. 

The redundancy for pyramid frames is A\ = 3/2 at level 1, A% = 7/4 at 
level 2, leading to Aao = 2 (see Figure 10.21) , far less than the shift-invariant 
DWT construction we will see in a moment. Thanks to this constant redundancy, 
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2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 

n 

Figure 10.21: Sampling grid corresponding to the time-frequency tiling of the pyramid 
coding scheme (points — nonredundant, squares — redundant). 




fa 



a 2 



Figure 10.22: Two-level pyramid decomposition of an image x. A first-level coarse 
approximation a\ is computed. A first-level prediction error fa is obtained as the difference 
of x and the prediction calculated on Hi. A second- level coarse approximation Q2 is 
computed. A second-level prediction error fa is obtained as the difference of «i and the 
prediction calculated on Q2- The scheme is redundant, as the total number of samples 
in expansion coefficients fa, fa, a.2 is (1 + 1/4 + 1/16) times the number original image 
samples, yielding redundancy of about 31%. 



pyramid coding has been used together with directional coding to form the basis 
for nonseparable multidimensional frames called contourlets (see Further Reading). 
An example of a pyramid decomposition of an image is given in Figure 10.221 

10.5.3 Shift-Invariant DWT 

The shift-invariant DWT is basically the nondownsampled DWT (an example for 
J = 3 levels is shown in Figure 10.23) . It is sometimes called stationary wavelet 
transform, or, algorithme a trous r 47 } due to the its implementation algorithm by 
the same name (see Section [10.6.1) . 

Let g and h be the filters used in this filter bank. At level £ we will have 
equivalent upsampling by 2 £ , which means that the filter moved across the upsam- 
pler will be upsampled by 2 , inserting {2 l — 1) zeros between every two samples 
and thus creating holes (thus algorithm with holes). 

Figure [IP. 241 shows the sampling grid for the shift-invariant DWT, from where 
it is clear that this scheme is completely redundant, as all points are computed. 
This is in contrast to a completely nonredundant scheme such as the DWT shown 
in Figure [10.16( a). In fact, while the redundancy per level of this algorithm grows 
exponentially since A% = 2, A% = 4, . . ., Aj = 2 J , . . ., the total redundancy for J 
levels is linear, as A = Aj 2~ J + Yl'e=i At 2~ £ = (J+l). This growing redundancy is 



147 From French for algorithm with holes, coming from the computational method that can take 
advantage of upsampled filter impulse responses, discussed in Section |10. 6. 11 
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H(z) 



H(z 2 ) 



H(z 4 ) 



level 1 



0* 



level 2 



& 



level 3 



& 



G(z 2 



G(z) 



G{z 4 



Figure 10.23: The synthesis part of the equivalent 3-channel filter bank implementing 
the shift-invariant DWT with J — 3 levels. The analysis part is analogous and filters 
are given in ( | 10.921) . This is the same scheme as in Figure [10.181 with all the upsamplers 
removed. 
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26 28 30 



Figure 10.24: Sampling grid corresponding to the time-frequency tiling of the shift- 
invariant DWT (points — nonredundant, squares — redundant). 



the price we pay for shift invariance as well as the simplicity of the algorithm. The 
2D version of the algorithm is obtained by extending the ID version in a separable 
manner, leading to the total redundancy of A = Aj 2~ J + 3~^2' e=1 A^2~ e = (3J+1). 
Exercise 10.12 illustrates the redundancy of such a frame. 



10.6 Computational Aspects 

10.6.1 The Algorithm a Trous 

This algorithm was introduced as a fast implementation of the dyadic (continuous) 
wavelet transform by Holschneider, Kronland-Martinet, Morlet, and Tchamitchian 
in 1989, and corresponds to the DWT with samplers removed. We introduced 
it in Section 10.5 as shift-invariant DWT and showed an example for J = 3 in 
Figure 10.23[ The equivalent filters in each branch are computed first, and then, 
the samplers are removed. Because the equivalent filters are convolutions with 
upsampled filters, the algorithm can be efficiently computed due to holes produced 
by upsampling. 
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aTrous(a(°l) 

Input: x = a { •' , the input signal. 

Output: a( J ', /3^ , I = 1, 2, . . . , J, transform coefficients. 

initialize 

for I = 1 to J do 

a W =Q (<-i)*( T 2 £ - 1 )g 

pW =a («-i)*( T 2«- 1 )fe 
end for 
return a< J >, fiW , 1 = 1,2, ..., J 

Table 10.5: Algorithme a trous implementing the shift-invariant DWT. Upsampling an 
impulse response g by a factor of n is denoted by (j n)g. 



10.6.2 Efficient Gabor and Spectrum Computation 

10.6.3 Efficient Sparse Frame Expansions 

Matching Pursuit 

Orthonormal Matching Pursuit 
Linear Programming 
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Chapter at a Glance 

This chapter relaxed the constraint of nonredundancy bases carry, using frames to achieve 
robustness and freedom in choosing not only the best expansion, but also, given a fixed 
expansion, the best expansion coefficients under desired constraints. We introduced these 
mostly on finite-dimensional frames, as they can be easily visualized via rectangular ma- 
trices. The infinite-dimensional frames we discussed were only those implementable by 
oversampled filter banks, summarized in Table 10.61 



Oversampled Filter Bank 



Block diagram 



9m -l 



flu 



'An) au " (Tiv) 



M-^W 



9 m -l 



.','" 



0* 



Basic characteristics 



number of channels 


M > N 








sampling factor 


N 








channel sequences 


Qi,n 


i = 0, 1, . 


., M- 


1 


Filters 


Synthesis 


Analysis 






filter i 


9i,n 


<fc,„ i = 0, 1, . 


., M - 


1 


polyphase component j 


9i,j,n 


9i,j,n j = 0, 1, . 


., N - 


1 



Table 10.6: Oversampled filter bank. 



Block Overlapped 

transforms transforms 

(Fourier-like) 



Time-frequency 

constraints 

(wavelet-like) 



Bases 
Frames 



DFT 
HTF 



LOT 

Local Fourier 



DWT 
Oversampled DWT 



Table 10.7: Bases versus frames. 

We discussed two big classes of frames following their counterparts in bases: local Fourier 
frames and wavelet frames. Table 10.7 depicts relationships existing between various 
classes of bases and frames. For example, the block-transform counterpart of the DFT 
are the harmonic tight frames, while the same for the LOT will be local Fourier frames, 
obtained by both complex-exponential modulation as well as cosine modulation. By in- 
creasing the support of basis functions we can go from the DFT to the LOT, and similarly, 
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from harmonic tight frames to local Fourier frames. Imposing time-frequency constraints 
leads to new classes of representations, such as the DWT, whose frame counterpart is the 
oversampled DWT. 

Historical Remarks 

In the signal processing and harmonic analysis communities, frames are generally consid- 
ered to have been born in 1952 in the paper by Duffin and Schaeffer [50]. Despite being 
over half a century old, frames gained popularity only in the 1990s, due mostly to the work 
of three wavelet pioneers — Daubechies, Grossman and Meyer [42]. An important piece to 
understanding frames came with Naimark's theorem, known for a long time in operator 
algebra and used in quantum information theory, and rediscovered by several people in 
the 1990s, among others, Han and Larson [67]; they came up with the idea that a frame 
could be obtained by compressing a basis in a larger space. 

The idea behind the class of complex exponential-modulated frames, consisting of 
many families, dates back to Gabor [55] with insight of constructing bases by modulation of 
a single prototype function. Gabor originally used complex-exponential modulation, and 
thus, all those families with the same type of modulation are termed complex exponential- 
modulated frames, or sometimes, Gabor frames. Other types of modulation are possible, 
such as cosine modulation, and again, all those families with cosine modulation are termed 
cosine-modulated frames. 

Frame-like ideas, that is, building redundancy into a signal expansion, can be found 
in numerous fields, from source and channel coding, to communications, classification, 
operator and quantum theory. 

Further Reading 

Books and Textbooks The sources on frames are the book by Daubechies [41], a text 
by Christensen [29] a number of classic papers [25, 69] [40] [67] as well as an introductory 
tutorial on frames by Kovacevic and Chebira [90] , 

Results on Frames A thorough analysis of oversampled filter banks implementing frame 
expansions is given in [37] 161 [38]. Following up on the result of Benedetto and Fickus [9] 
on minimizing frame potential from Section 110.2.1] Cassaza, Fickus, Kovacevic, Leon and 
Tremain extended the result to nonequal-norm tight frames, giving rise to the fundamental 
inequality, which has ties to the capacity region in synchronous CDMA systems [170]. 
Casazza and Kutyniok in [27] investigated Gram-Schmidt-like procedure for producing 
tight frames. In [13] , the authors introduce a quantitative notion of redundancy through 
local redundancy and a redundancy function, applicable to all finite-dimensional frames. 

Local Fourier Frames For finite-dimensional frames, similar ideas to those of harmonic 
tight frames have appeared in the work by Eldar and BSlcskei [51] under the name geomet- 
rically uniform frames, frames defined over a finite Abelian group of unitary matrices both 
with a single as well as multiple generators. Harmonic tight frames have been generalized 
in the works by Vale and Waldron [163], as well as Casazza and Kovacevic [26] . 

Harmonic tight frames, as well as equiangular frames (where \(<fi, f>j)\ is a con- 
stant) [119], have strong connections to Grassmannian frames. In a comprehensive pa- 
per [146], Strohmer and Heath discuss those frames and their connection to Grassmannian 
packings, spherical codes, graph theory and Welch Bound sequences (see also [75j). 
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The lack of infinite-dimensional bases with good time and frequency localization, 
the result of the discrete Balian-Low theorem, prompted the development of oversampled 
filter banks that use complex-exponential modulation. They are known under various 
names: oversampled DFT filter banks, complex exponential-modulated filter banks, short- 
time Fourier filter banks and Gabor filter banks and have been studied in [36] | 16[ 151 14 1 
15311145] . Bolcskei and Hlawatsch in [15] have studied the other type of modulation, cosine. 
The connection between these two classes is deep as there exists a general decomposition 
of the frame operator corresponding to a cosine-modulated filter bank as the sum of the 
frame operator of the underlying complex exponential-modulated frame and an additional 
operator, which vanishes under certain conditions [7]. The lapped tight frame transforms 
were proposed as a way to obtain a large number of frames by seeding from LOTs [2811125] . 

Wavelet Frames Apart from those already discussed, like pyramid frames [21], many 
other wavelet-like frame families have been proposed, among them, the dual-tree complex 
wavelet transform, a nearly shift-invariant transform with redundancy of only 2, introduced 
by Kingsbury [85] [86] [87]. Selesnick in [127] [128] followed with the double- density DWT 
and variations, which can approximately be implemented using a 3-channel filter bank 
with sampling by 2, again nearly shift invariant with redundancy that tends towards 
2 when iterated. Some other variations include power-shiftable DWT [132] or partial 
DWT [137], which removes samplers at the first level but leaves them at all other levels, 
with redundancy Aj — 2 at each level and again near shift invariance. Bradley in [18] 
introduces the overcomplete DWT, the DWT with critical sampling for the first k levels 
followed by the shift-invariant DWT for the last j — k levels. 

Multidimensional Frames Apart from obvious, tensor- like, constructions of multidi- 
mensional frames, true multidimensional solutions exist. The oldest multidimensional 
frame seems to be the steerable pyramid introduced by Simoncelli, Freeman, Adelson 
and Heeger in 1992 [132], following on the previous work by Burt and Adelson on pyra- 
mid coding [21]. The steerable pyramid possesses many nice properties, such as joint 
space-frequency localization, approximate shift invariance, approximate tightness and ap- 
proximate rotation invariance. An excellent overview of the steerable pyramid and its 
applications is given on Simoncelli's web page [131] . 

Another multidimensional example is the work of Do and Vetterii on contourlets [44, 
135] , motivated by the need to construct efficient and sparse representations of intrinsic 
geometric structure of information within an image. The authors combine the ideas of 
pyramid filter banks [43] with directional processing, to obtain contourlets, expansions 
capturing contour segments. These are almost critically sampled, with redundancy of 
1.33. 

Some other examples include [97] where the authors build both critically-sampled 
and shift-invariant 2D DWT. Many "-lets" are also multidimensional frames, such as 
curvelets [24, 23] and shearlets [94]. As the name implies, curvelets are used to approximate 
curved singularities in an efficient manner [24, 23] . As opposed to wavelets, which use 
dilation and translation, shearlets use dilation, shear transformation and translation, and 
possess useful properties such as directionality, elongated shapes and many others [94]. 

Applications of Frames Frames have become extremely popular and have been used in 
many application fields. The text by Kovacevic and Chebira [90] contains an overview of 
many of these and a number of relevant references. In some fields, frames have been used 
for years, for example in CDMA systems, in the work of Massey and Mittelholzer [ 104] on 
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Welch bound and sequence sets for CDMA systems. It turns out that the Welch bound is 
equivalent to the frame potential minimization inequality. The equivalence between unit- 
norm tight frames and Welch bound sequences was shown in [146]. Waldron formalized that 
equivalence for general tight frames in [171] , and consequently, tight frames are referred 
in some works as Welch bound sequences ]150j . 



Exercises with Solutions 

10.1. Expansion of a Complex Exponential and a Kronecker Delta 

Given is a sequence x consisting of a single complex sinusoidal sequence of unknown fre- 
quency (2n/N)£ and a Kronecker delta sequence of unknown location k: 

x n = fte^W"+ft<5„_* 

for n = 0, 1, . . . , TV — 1, and (., k e [0, 1, . . . , TV - 1]. Given a frame consisting of a DFT 
basis and an identity basis as in ( [10. 28} , compute the expansion coefficients a = $*a:, 
which, as per fllO.291) . lead to perfect reconstruction x = (l/2)<3?ct. 

Solution: Consider expansion coefficients on = (x, fi). 

If <pi is one of the DFT basis vectors ( |2.160| ), that is, i = 0, 1, . . . , TV — 1, then 



Yl Xn fh 



n = 
iV-l 



J2 Wi e* W>** + ft *»-fc)(-4 e-^' N ^) 



n = 

R N ~ 1 R 

- _£!_ V" e J{2^/N)(l-i)n , _P2_ e -j(2n/N)ik 

( fl^+fee-'l 2 '/")", i=t; 

(a) I VW 

where (a) follows from the orthogonality of the roots of unity fl2,277c| . Therefore, among 
the first TV expansion coefficients, if TV ft 3> ft, only one will stand out, namely, that 
matching the frequency (2tt/N)£. 

If, on the other hand, ipi is one of the standard basis vectors, that is, i = TV", TV + 
1 2TV- 1, then 



a i = Yl Xn V*, n 
n=o 

Af-l 

= J2 (ft «J(=«»/w)i» + h 5 n _ k )S n+N _ t 

n = 

Af-l Af-l 

= ft 22 e J ' (2 " /JV)ta 5 n + Af-»+ft 22 S n-kS n + N-> 
n=0 n=0 

f /3 ie J(2.r/iv)tt +/ g 2i l = N + k . 

\ fte^ 2 -/^)", i^N + k. 

Thus, among the last TV expansion coefficients, if ft 3> ft, only one will stand out, the 
coefficient matching the location of the Kronecker delta impulse, k. In other words, if we 
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choose /3i ~ O(l) and /?2 ~ O(N), the two coefficients localizing the complex sinusoid and 
the Kronecker delta impulse will stand out. Note, however, that while this is a minimum 
£ 2 -norm solution, it is not sparse. It is possible to find an expansion coefficient vector a' 
with exactly 2 nonzero coefficients; it will be sparse, with a larger £ 2 -norm. 

10.2. Canonical Dual Frame Minimizes £ 2 Norm of an Expansion 

Given a frame <£> and its canonical dual frame (10.65a) , show that among all possible dual 
frames, the canonical dual frame minimizes ||o|| 2 . 

Solution: The solution uses the geometry of the expansion as we have seen in Section [10,2| 
The frame <J? is of size N X M and rank($) = N. Moreover, its null space is of size (M — N) 
according to Table 11.21 Call ipf- , i = 0,1, . . . , M — N — 1, the vectors spanning that null 
space, that is, &<p^~ = 0, for i = 0,1, . . . , M — N — 1. Call $-*- the following matrix of size 
N X M: 

where F is an arbitrary, rank-(M — N), (M — N) X M matrix of scalars. In other words, $-*- 
is just a matrix of M vectors spanning the null space of $ (linear combinations of ip s). 
We thus know that 

$(5 X )* = 0, 

for any F. This also means that any dual of <I> can be written as 

5 + 5 ± , 

since 

*(* + * ± )* = *$* +*(<I- ± )* = I. 

' o 

We can now express the expansion coefficients as a function of $-*- as 

a($ ± ) = ($* + ($> ± )*)x. 
To minimize its energy, we have to minimize 
||a(3> x )|| 2 = a($- L )*a($ ± ) 

= a:*(i + $ J -)(5* +($ J -)*)a: 

= z*($$* + 5(5 ± )* +$ J -$* +$ ± ($ ± )*)x, 

a quadratic form minimized for $-*- = (actually T = 0). 

10.3. Computation of the Canonical Dual Frame 

Given a frame $ and its canonical dual frame fll0.65a) , show that it can be computed as: 

$ = f^fl t) $. 

Ainin + Amax k = V A m j rl + A max / 

Solution: We just sketch a proof here; see |41j for a rigorous development. 
If frame bounds A min and A max are close, that is, if 

V = k(T) - 1 < 1, 

then ( 110.62) implies that T is close to ((A min + A max )/2)1, or T~ l is close to (2/(A min + A max ))7. 
This further means that x can be written as follows: 

M - 1 



A min +A max 



2_] {x, <pt) •fit + Rx, 



where R is given by (use ( [10.64b) ) 



R = I T. (E10.3-1) 
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Using fllO-62) we obtain 



and, as a result, 



^-\j < R< mzir 



k(T) + 1 
.11 < < T) ~ l 



k(T) + 1 



k(T) + 1 2 + V 

From ( |E10.3-1| ) and (E10.3-2) . T' 1 can be written as 



< 1. 



(E10.3-2) 



T -i 



^min + ^n 



-(I-R)- 1 



^min + ^n 



implying that 



T~V< 



2 



k = 
oc / 



Amin + An 



Erf 

i=0 



k 
-T) ipi. 



Like for biorthogonal bases in Section 1.5,31 we can characterize the convergence using the 
condition number of the Hermitian matrix T, k(T) = A max /A m i n . Thus, when k(T) is 
large, convergence is slow, while as re(T) tends to 1, convergence is faster, and the dual 
frame is close to <E>. Specifically, when all the vectors are of unit norm and A min = A max = 1, 
we have an orthonormal basis and 7 1-1 = I, 
10.4. Condition Numbers and Convergence 

Compute the condition number and comment on the convergence of <3> for the following 
three frames: 



(i) Frame given in 1 )10.19} 



(ii) Frame given in fll0.13| ). where the first vector is perturbed by 9, so that the resulting 
frame is now 

"cose -i -i 

sin6» ^ -^A 



<3> 



(iii) Frame given by 



Solution: 



1 cos 8 cos 2d 
sin0 sin2e 



(E10.4-1) 
(E10.4-2) 



(i) We start with (|10. 19| ). We have already computed the eigenvalues of $<t>*. It turns 
out that T = <E>$* = $$*, and thus, those eigenvalues are the same, A m i n = 1, 
Amax = 3. The convergence depends on k(T) = 3, meaning that it is reasonable but 
not very close to one. 

Turning this frame into a unit-norm one (see Exercise 10.6) , we get a new frame 



*' 



1 
1 



1 
1 



V2. 



with new frame bounds 



1 



Amin = \ (3 " VS) 

and new condition number of 

K (T') 



A„ 



V5 



1 ° -? 

X -7f. 



y/E) 



6.85, 



3-^5 
more than twice as large as for the unnormalized version where re(T) = 3. 
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<Pi 



: r2 



Vf 



y>o 





'l ¥?2 




-1 


i 






logllT-^-T-^OHII 




3 2 3 



(d) 



logllT-^-T^c&Hl 




(e) 



(f) 



Figure E10.4-1: Two example frames, (a) from ( ]E10.4-lj) and (b) from ( jElO.4-2] ), and 
the behavior of their respective k(T) in (c) and (d). In (e) and (f) we show the convergence 
behavior of the Frobenius norm of the difference between the true dual frame and that 
computed using ( ]1(J.67[) for (c) and (d), respectively. 



(ii) This frame is given in Figure [E10,4-l( a), The eigenvalues of the corresponding T 
are 



|sin0| 



<T) 



An 



■2| sin 6»| 



+ |sin6»|, 



3-2|sin0| 

Figure [E10,4-l[ c) illustrates the behavior of the ratio k(T). For = 0, we get our 
original, unperturbed, frame back, and thus, k(T) = 1, meaning that T = (3/2)7 and 
no inversion is necessary to compute the dual frame. When 9 = n/2, k(T) reaches it 
maximum of 5, when the perturbed ipg finds itself between ipi and —ip2 ■ The frame is 
not well behaved and (110. 67| ) will take a long time to converge. The ratio k(T) then 
falls back to 1 for 9 = it, as then the perturbed ipg is just — ipg. Figure [E10.4-lf e) 
illustrates convergence for a few angles 9. In the figure, T^ denotes T -1 computed 
via ( |10,67[ ) using (K + 1) terms of the summation. Convergence is measured using 
the logarithm of the difference, in the squared Frobenius norm (see ( [1.217) ), of the 
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dual frame operator between two consecutive iterations. We see that, as expected, 
convergence is the fastest for 8 = 7r/8 (smallest k(T)) and decreases as 8 increases to 
7r/2. Figure IE10.4-2 1 illustrates the behavior of the dual frame graphically for three 
values of 8 = 0, vr/4, 7r/2; we can see how the dual frame degenerates from the dual 
frame for 8 = (basically the frame itself, just scaled), through 8 = n/4, and finally 
to 8 = it/2, where the first vector is very small and the other two are very close to 
being colinear. 




(a) 



(b) 



Figure E10.4-2: Three different frames from Figure |E10.4-l[ a) for = 0, tt/4, it/2, (a) 
Frame vector <po for 8 = (red), 8 = tt/4 (black) and 8 = tt/2 (blue). Frame vectors <pi,2 
are the same for all three frames, (b) Corresponding dual frames for 8 = (red), 8 = tt/4 
(black) and 8 = tt/2 (blue). 



(iii) This frame is given in Figure rE10.4-l[ b). The eigenvalues of the corresponding T 
are 

A min = i(3-|2cos20 + l|) A n 

3 +|2 cos 20 + l| 
k{T) = 



-(3 + |2cos20 + l|) 



|2cos26» + l| 



Figure |E10.4-l| d) illustrates the behavior of the ratio re(T). For 8 = tt/3 and 
8 = 2tt/3, the ratio is 1 and no inversion is necessary. Note that for 8 = 2tt/3 we 
have exactly the same frame we just saw (unperturbed version), the same one from 
( 110.13] ). For 8 e [?r/3, 2tt/3], the ratio is between 1 and 2, with k(T) = 2 for 8 = tt/2. 
However, for 8 < tt/3 and 8 > 2tt/3, the ratio becomes unbounded. This is easily 
understood as we think what would happen around 8 = 0. All three frame vectors 
are identical (or very close to each other), and thus, the frame would disintegrate to 
a single vector. Figure [E10,4-l( f) illustrates convergence for a few angles 8. We see 
that, as expected, convergence is slower for 8 = tt/2 than for 8 = 3tt/8 (larger re(X)) 
and the slowest for 8 = tt/8. 



These simple examples illustrated the meaning behind the frame bounds A m i n , 
the behavior of their ratio re(T). 



An 



and 



Exercises 

10.1. Unit-Norm Tight Frames 

Prove that the following two statements are equivalent: 

(i) {ipi = [cos 9i sin#i] } i= Q X , is a unit-norm tight frame. 



(«) E^o 1 



where Z; 



e j2B L f or j = o, 1, . . . , M - 1. 



10.2. Parametrization of Finite- Dimensional Frames 

Given is a unit-norm tight frame with 4 vectors in R 2 , {ifii 



}Lo- Pr 
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that all unit-norm tight frames with N = 2, M = 4, are unions of two orthonormal bases, 
parameterized by the angle between them (within the equivalence class of all frames ob- 
tained from the original frame by rigid rotations, reflections around an axis and negation 
of individual vectors) . 
(Hint: Use the result of Exercise 110.11 ) 

10.3. Harmonic Tight Frames 

Harmonic tight frames are obtained from the DFT by seeding, that is, by deleting the last 
(M — N) columns of the DFT matrix * = DFT a-/ from {2.161a) yielding the harmonic 
tight frame matrix <E> as in fll0.47a) . 

(i) Show that a harmonic tight frame is indeed tight and find its frame bound. Find its 

dual frame <£>. 
(ii) Consider now applying a running transform, that is, applying \l>* on an input of 

length N and advancing it by N sanrples at a time. Choose M = 4, TV = 2. Find 

$, $*, and <3? and comment. 



10.4. Vandermonde Frames 

Given is a frame with M vectors in an TV-dimensional space generated by [l t\ 
ti ^ t.j , as follows: 



*M- 



lyo Vi 



¥>M- 



1 



ti, 



r 



(i) Prove that any subset of TV frame vectors is linearly independent, and therefore, 
that <E> contains („) different biorthogonal bases. Frames with these property are 
said to be maximally robust to erasures, as in Definition 10.41 

(ii) Choose U = W^ = e~ , ' 2 "' 4 ', or, a harmonic tight frame as in (10.47b) . Show that 
it satisfies (i)| that is, it is maximally robust. 

10.5. Useful Frame Facts 

Prove the following frame facts (10.64) : 

(i) (MS: Efj-o 1 a, = Efio 1 llv.ll 2 ; 

(ii) p064b| : Tx = J2fl- 1 {x, V ,) v >. l ; 
(hi) mSM)-- (*, Tx) = Efio 1 l<*. Vi)\ 2 ; 

(iv) (ToMdl Eflo 1 ^, t^) = E5= \(v>i, v 3 )\ 2 - 

10.6. Frame Normalization 

Show that we can normalize any frame $ to have all the vectors of unit norm. Find whether 
or not rank, eigenvalues of T and frame bounds change due to the normalization. 

10.7. Frame Expansions with Positive Expansion Coefficients 

Prove that to obtain a frame expansion with all positive expansion coefficients, one needs 
at least M = N + 1 frame vectors in an iV-dimensional space. Find one such example. 
(Hint: Consider the tight frame for M 2 in (10.13) first, and then generalize.) 



10.8. Oversampled Filter Bank 

Given is a 3-channel analysis/synthesis filter bank with sampling by 2 as in Figure 10.41 
The channel signals cti are filtered by the channel filters Cj, i = 0, 1, 2 before going through 
the upsamplers. The analysis, synthesis and channel filters are given by 

1 



F 1 (z). 



Go(z) -- 




Gi(z) -- 


= l + z"\ 


G 2 (z) 


G (z) = 


= l-*r\ 


Gi(z) ~- 


= z~\ 


Ga(z) 


Co(z) = 


= F (z), 


Ci(z) = 


-- F (z) + F 1 (z), 


C 2 (z) 
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Verify that the overall system is shift invariant and performs a convolution with the filter 
F(z) = z- 1 (F (z 2 ) + z- 1 F 1 (z 2 )). 

10.9. Overcoming Limitations of Balian-Low Theorem 

Mimic the analysis in Example 10,7 and find the conditions for a 4-channel complex 
exponential-modulated filter bank with sampling by 2 to implement a tight frame ex- 
pansion. 

10.10. Oversampled Complex Exponential- Modulated Transmultiplexer 

Consider a complex exponential-modulated oversampled filter bank implementing a trans- 
multiplexer: Start with a two-channel synthesis filter bank with upsampling by 4, followed 
by a two-channel analysis filter bank with downsampling by 4. This is a redundant scheme 
since only 2 signals are multiplexed onto a channel that has 4 times higher rate. 

(i) Express this system in polyphase domain, 
(ii) Use the fact that the filters are modulated to express the polyphase matrix as the 

product of a diagonal matrix and 2x2 Fourier matrices, 
(iii) Express the input-output relationship, and indicate conditions on G(z) and G(z) to 

obtain perfect reconstruction, 
(iv) Verify that G(z) = G(z) = (l/2)(l + z _1 + z _2 + z -3 ) leads to perfect reconstruction 
in this redundant case (two-channel synthesis/analysis filter bank with sampling by 
4) while it does not for critical sampling (two-channel synthesis/analysis filter bank 
with sampling by 2). 

10.11. Pyramid Filter Banks Implementing Basis Expansions 

Consider a pyramid decomposition as discussed in Section 10.5.21 The analysis filter bank 
is shown in Figure HO. 191 and Figure HO. 20( a). and the synthesis in Figure HO. 20( b). 

(i) Assume that filters g and g are biorthogonal, that is, they satisfy (7.66| ). and show 

perfect reconstruction, 
(ii) Assume that filters g and g are orthogonal, that is, g n = g— n , and g satisfies A7.13) 
and show perfect reconstruction. Verify the analysis by showing outputs at different 
points in the system with Haar filters from Table 7.81 

10.12. Shift-Invariant DWT 

Given is a two-channel orthogonal filter bank as in Theorem 7.11 

(i) Remove samplers from such a filter bank and prove the following energy-conservation 
equality: 

\\xf = 2(|H| 2 + ||/3|| 2 ), 

where x is the input signal and a, f3 are the sequences of transform coefficients at 
the outputs of the lowpass/highpass channels, respectively, 
(ii) Show how to construct a shift-invariant DWT from such a filter bank. 

10.13. Cost of Computing with Harmonic Tight Frames 

Given is a harmonic tight frame as in ( |10.47a| ) and Exercise 110.31 

(i) With N = M/2 and the cost of computing the DFT as in (12.261| ). what is the cost 
of computing the frame transform of a vector? What about the dual frame? 
(Hint: It is enough to consider the straightforward algorithm; you do not have to 
try to take advantage of overlaps in your computations.) 
(ii) Consider now applying a running transform, that is, applying a length-M = 2 m 
DFT ** on an input of length N = 2 m ~ k , k = 1,2, ...,N, and advancing it by 
N samples at a time. What is the cost per input sample of computing this running 
transform as a function of N and Ml 

(Hint: As above, it is enough to consider the straightforward algorithm; you do not 
have to try to take advantage of overlaps in your computations.) 
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Chapter 11 

Local Fourier Transforms, 
Frames and Bases on 
Functions 



Dear Reader, 



This chapter needs to be finished. The only existing section, Section ] 11.21 
has been proofread and integrated with the previous text. The rest of the 
sections are yet to be written. 

Please read on. — MV, JK, and VKG 

The aim of this chapter follows that of Chapter \E\ but for functions. We 
look for ways to localize the analysis Fourier transform provides by windowing 
the complex exponentials. As before, this will improve the time localization of 
the corresponding transform at the expense of the frequency localization. The 
original idea dates back to Gabor, and thus Gabor transform is frequently used; 
windowed Fourier transform and short-time Fourier transform are as well. We 
choose the intuitive local Fourier transform, as a counterpart to local Fourier bases 
from Chapter \8\ and local Fourier frames from Chapter [T0| 

We start with the most redundant one, local Fourier transform, and then 
sample to obtain local Fourier frames. With critical sampling we then try for local 
Fourier bases, where, not surprisingly after what we have seen in Chapter \S\ bases 
with simultaneously good time and frequency localization do not exist, the result 
known as Balian-Low theorem. Again as in Chapter [8j, cosine local Fourier bases 
do exist, as do wavelet ones we discuss in the next chapter. 

11.1 Introduction 

Fourier Series Basis Expansion 

Localization Properties of the Fourier Series 

Chapter Outline 

We start with the most redundant one version of the local Fourier transform, the 
local Fourier transform in Section 11.21 and then sample to obtain local Fourier 

775 
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frames in Section 11.31 With critical sampling we then try for local Fourier bases 
in Section 11.41 where, not surprisingly after what we have seen in Chapter \8\ 
complex exponential-modulated local Fourier bases with simultaneously good time 
and frequency localization do not exist, the result known as Balian-Low theorem. 
Again as in Chapter [SJ cosine- modulated local Fourier bases do exist, as do wavelet 
ones we discuss in the next chapter. 

Notation used in this chapter: The prototype window function in this chapter is 
named p(t); this is for consistency with the prototype window sequences used in 
Chapters \S\ and \W\ g(t) is more commonly seen in the literature. □ 



11.2 Local Fourier Transform 

Given a function x(t), we start with its Fourier transform X(lo) as in Definition |3.101 
We analyze x{t) locally by using a prototype window function p(t). We will assume 
pit) is symmetric, p(t) = p(—t), and real. The prototype function should be smooth 
as well, in particular, it should be smoother than the function to be analyzed ! 148 ! 

11.2.1 Definition of the Local Fourier Transform 

We can look at our windowing with the prototype function p(t) in two ways: 
(i) We window the function x(t) as 

X T {t) = p{t-T)x{t), (11.1) 

and then take its Fourier transform ( |3.48aj ), 

X t (uj) = (x T ,v u ) = [ x T (t) e~ jut dt = [ x(t)p(t-T)e- juJt dt, (11.2) 

for lu ER. 
(ii) We window the complex exponentials Vu, (t) = e JUJt yielding 

Sn,r(t) = P(t-T)e jnt , (11.3) 

G T (w) = e- j ^~ n)T P(w-fi), 

for r, Q G K, and then define a new transform by taking the inner product 
between x and gn,r(t) as 

X(Q,t) = (x,g n , T ) = [ x(t)p(t-T)e-i nt dt, (11.4) 

JteR 

that is, this new transform X(Q,t) is the Fourier transform of the windowed 
function x T as in ( 111.21 ). 
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Figure 11.1: Local Fourier transform. The prototype function p(t) is centered at r, and 
thus, the Fourier transform only sees the neighborhood around r. For simplicity, a hat 
prototype function is shown; in practice, smoother ones are used. 



From the construction, it is clear why this is called local Fourier transform, as shown 
in Figure d 1.11 We are now ready to formally define it: 



Definition ll.l (Local Fourier transform) The 


local 


Fourier 


transform 


of a function x(t) is a function of Q, r G K given by 










X(SI,t) = (x,g n , T ) = f x(t)p(t - r)e^ n ' 


dt, 


n, 


tgK. 


(11.5a) 


The inverse local Fourier transform of X(fl, r) is 










Z7r JnmJTm 


At) 


dfl, dr. 




(11.5b) 


To denote such a local Fourier-transform pair, we write: 








x(t) S X(0,r). 











We will prove the inversion formula (11.5b) in a moment. For the analysis 
of a function x(t), the X(Q,t) uses time-frequency atoms gn, T (t) that are cen- 
tered around Q and r, as shown schematically in Figure 11.21 The local Fourier 
transform is highly redundant, mapping a one-dimensional function x(t) into a two- 
dimensional transform X(Cl, r). 

Prototype Window Function The prototype function p(t) is critical in the local 
Fourier transform. The classical choice for p(t) is the unit-norm version of the 
Gaussian function given in ( 13. 13a) with 7 = (2a /ir) 1 / 4 : 



p{t ) = (^)l/4 e -«(^M) 2 j 
7T 



(11.6) 



148 Otherwise, it will interfere with the smoothness of the function to be analyzed; see Sec- 



tion 111.2.21 
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%,t(0 



(a) 




It) 



>JI 



->£ 



Figure 11.2: Time-frequency atom used in the local Fourier transform, (a) Time-domain 
waveform </ w , T {i)- The prototype function is the hat function, and the real and imaginary 
parts of the complex exponential-modulated prototype function are shown, (b) Schematic 
time-frequency footprint of gu,r{t). 



where a is a scale parameter allowing us to tune the time resolution of the local 
Fourier transform. 

Another classic choice is the unit-norm sine function from ( 13.751 ) . 



,(t) 



I u>o sin u>ot/2 
2tt uj t/2 : 



(11.7) 



that is, a perfect lowpass of bandwidth \u>\ < ujq/2. Here, the scale parameter allows 
us to tune the frequency resolution of the local Fourier transform. 

Other prototype functions of choice include rectangular, triangular (hat) or 
higher-order spline functions, as well as other classic prototype functions from spec- 
tral analysis. An example is the Harming, or, raised cosine window (we have seen 
its discrete counterpart in ( 12.15) ), its unit-norm version defined as 



Pit) 



3 ^(l + cos(27rf/a)), 



\t\ < a/2; 
otherwise, 



where a is a scale parameter ] 149 | 



Inversion of the Local Fourier Transform While we have taken for granted that 
the inversion formula ( ]11.5b| ) holds, this is not a given. However, given the redun- 
dancy present in the local Fourier transform, we expect such an inversion to be 
possible, which we now prove. 

We are going to apply the generalized Parseval's equality to ( ]11.5b| ), and we 



149 In the signal processing literature, the normalization factor is usually 1/2, such that p(0) = 1. 
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11.2. Local Fourier Transform 779 

thus need the Fourier transform of X(CI,t) with respect to r. We have that 
X(CI,t) = f x{t)p{t-T)e~ jflt dt, 

( = } / p(r-t)i(i)e-^<« ( = ] (p*xci)(t), 

Jt&S. 

where (a) follows from pit) = p(—t), and in (b) we introduced irfi(t) = x(t)e~ 3 . 
Using the shift-in-frequency property ( 13.57) , the Fourier transform of xci(t) is X(uj+ 
CI). Then, using the convolution property ( 13.64) , the Fourier transform of X(CI,t) 
with respect to r becomes, 

X(Cl,oS) = P{uj)X{uj + Ci). (11.8) 

In ( 111 .5b[ ) , the other term involving r is gn. T (t) = pit — r)e 3 . Using the 
shift-in-time property ( 13.561 ) and because pit) is symmetric, the Fourier transform 
of p(t — t) with respect to r is 

p{t-T) ££* e- iwt P(u). (11.9) 



We now apply the generalized Parseval's equality ( j3.69bj ) to the right side of ( jll.5bj ) 
1 ' ' X(Cl,T)p(t-T)e jClt dT) dCl 



2 71 " j new \Jt<_ 
,„, 1_ 

7eR V-" ■ij.-i 

(b) 1_ j 

(c) 



— I (— f X(w + Cl)P(u)P*(u>)e jut e jQt duj) dCl 

2n Jam \ 2lT Jujem J 

)\ 2 (— f Xicj + tyeJ^+^dCl) duj, 
\ 2lT Jnm J 

x(t)± [ \P{u)\*du ® x(t). 



where (a) follows from ( jll.81 ), ( 111.9) and generalized Parseval's equality ( ]3.69b) ; 
(b) from Fubini's theorem (see Appendix 1.A.3) allowing for the exchange of the 
order of integration; (c) from the inverse Fourier transform ( ]3.48bf ) ; and (d) from p 
being of unit norm and Parseval's equality ( 13. 69a) . 

11.2.2 Properties of the Local Fourier Transform 

We now look into the main properties of the local Fourier transform, including 
energy conservation, followed by basic characteristics such as localization properties 
and examples, including spectrograms, which are density plots of the magnitude of 
the local Fourier transform. 

TBD: Table with properties. 

Linearity The local Fourier transform operator is a linear operator, or, 

ax(t)+/3y(t) l^ aX(Cl,T)+j3Y(Cl,T). (11.10) 
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Shift in Time A shift in time by to results in 

x(t-t ) £-5 e- jnto X{fL,T-t ). (11.11) 

This is to be expected as it follows from the shift-in-time property of the Fourier 
transform, Q3.56J ). To see that, 

p(t - t) x(t - t ) e-i nt dt ( = } e- jnto f p(t' - (t - t ))x{t')e~ int ' dt' 

where (a) follows from change of variable t' = t — to] and (b) from the definition of 
the local Fourier transform ( |11.5a| ). Thus, a shift by to simply shifts the local Fourier 
transform and adds a phase factor. The former illustrates the locality of the local 
Fourier transform, while the latter follows from the equivalent Fourier-transform 
property. 

Shift in Frequency A shift in frequency by ujo results in 

e jmt x{t) ^ X(O-w ,r). (11.12) 



L 



To see this, 

p(t - T ) e jWot x(t) e~ jnt dt 
1 

p{t - t) x{t) e -m-"o)t dt = X {n-UJo,T), 
'tea 

the same as for the Fourier transform. As before, a shift in frequency is often 
referred to as modulation, and is dual to the shift in time. 

Parseval's Equality The local Fourier-transform operator is a unitary operator 
and thus preserves the Euclidean norm (see ( 11.51) ): 



Ml 



f \ x{t) \ 2 dt = J- f f \X(Q,r)\ 2 dndr = ±\\X\\ 2 . (11.13) 

We now prove Parseval's equality for functions that are both in C 1 and C 2 : It 
should come as no surprise that to derive Parseval's equality for the local Fourier 
transform, we use Parseval's equality for the Fourier transform. Start with the right 
side of ( TILIBI to get 



1- [ f \x(n,r)\ 2 dndr 

— / \X(uj + n)P(uj)\ 2 duj ) dn 

±- ( \p(oj)\ 2 (±[ Ly( w + n)|W) du, 
( = ] INI 2 J-/" \p{ u )?du ( i w», 



re 

(a) _L_ r 

2Wn e 

(b) 1 
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11.2. Local Fourier Transform 781 

where (a) follows from Parseval's equality ( ]3.69aj ) and ( jll.81 ); (b) from Fubini's 
theorem (see Appendix |l.A.3j ) allowing for the exchange of the order of integration; 
(c) from the inverse Fourier transform ( ]3.48b[ ); and (d) from p being of unit norm 
and Parseval's equality ( |3.69aj ). 

Redundancy The local Fourier transform maps a function of one variable into 
a function of two variables. It is thus highly redundant, and this redundancy is 
expressed by the reproducing kernel: 

K{Vt,T } uj 0l t ) = (ga tT , 9u ,t ) 

p(t - r)p{t - t ) e j{u}a ~ n)t dt. (11.14) 

While this is a four-dimensional object, its magnitude depends only on the two 
differences (ujq — ft) and (to — r) | 150 | 

Proposition 11.2 (Reproducing kernel formula for the local Fourier transform) 
A function X(wo,to) is the local Fourier transform of some function x(t) if and 
only if it satisfies 

X(n,r) = ^- f f X(w 0) io)#(fi,T l wo,to)dwo<fto. (11.15) 



Proof. If X(loq, to) is a local Fourier transform, then there is a function x(t) such that 
X(u>o, to) = X(ll> , to), or 



X(n,r) = / x(t)g^ T (t)t 

X(LUo,t )g^ Q> t (t)du dt \ gn, T (t)dt 



'te 

c-j r (± 

= 7T- \ I X(io Q ,to)[ / gn,r{t)guj 0> t {t)dt\ <ku dt , 

= — / / X(ai Q ,to)K(Q,T, ujo,to)dujodto, 

2n Jt en ^w eR 

( ]11.5b|) where (a) follows from the inversion formula (11.5b! ) and (b) from ( ]11.15[) . 

For the converse, write ( ]11.15j) by making K (f2,r,Wo,to) explicit as an integral 
over t (see ( 111.14] )): 

X(Q,r) = -!- / / / X(Lu ,to) gn, T (t) g^At) dtdujodt 



(a) 



/ gn, T (t) I — / / X(a;o,to)gw,t(t)dwodto J dt 

./teR \^ n Jt 1 €'& Ju 1 €R J 



1 g n , T (t)x(t)dt, 



tel 



This is expressed in a closely related function called the ambiguity function; see Exercise 111.11 
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Place Holder Place Holder 

for for 



Art Only Art Only 



(a) (b) 



Figure 11.3: Localization properties of the local Fourier transform, (a) A function per- 
fectly localized in time, a Dirac delta function at r, with a compactly supported prototype 
function [— T/2,T/2]. (b) A function perfectly localized in frequency, a complex expo- 
nential function of frequency u>, with a prototype function having a compactly supported 
Fourier transform [-B/2,B/2]. 



where (a) follows from Fubini's theorem (see Appendix |l.A.3j) allowing for the exchange 
of the order of integration, and (b) from the inversion formula flll.5b| ). Therefore, 
X(Cl, t) is indeed a local Fourier transform, namely the local Fourier transform of x(t). 

The redundancy present in the local Fourier transform allows sampling and inter- 
polation, and the interpolation kernel depends on the reproducing kernel. 



Characterization of Singularities and Smoothness To characterize singularities, 
we will take the view that the local Fourier transform is a Fourier transform of a 
windowed function x T (t) as in ( jll.lj ). Since this is a product between the function 
and the prototype function, using the convolution-in- frequency property ( 13.651 ), in 
the Fourier domain this is a convolution. That is, singularities are smoothed by the 
prototype function. 

We now characterization of singularities in time and frequency, depicted in 
Figure 11.31 

(i) Characterization of singularities in time: Take a function perfectly localized 
in time, the Dirac delta function x(t) = 5(t — to). Then 

X(Q,t) = j p(t - t) S(t - t ) e~3 nt dt { = ] p(t - t) e" J °*° , 

where (a) follows from Table 13.11 This illustrates the characterization of sin- 
gularities in time by the local Fourier transform: An event at time location 
£0 will spread around to according to the prototype function, and this across 
all frequencies. If pit) has compact support [— T/2, T/2], then X(Q,t) has 
support [—00, 00] x [to — T/2, to + T/2]. 
(ii) Characterization of singularities in frequency: Take now a function perfectly 
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11.2. Local Fourier Transform 783 

localized in frequency, a complex exponential function x(t) = e 3u>t . Then, 

X(Q,t) = j p(t - t) e-^-")* dt 
Jtem. 

C«) e -i(n-«)r j p ( t /) e -i(o-«)*' dt > 

JfeR 

( = } e - j( °^ )r P(n-o)), (11.16) 

where (a) follows from change of variables t' = t — r; and (b) from the Fourier 
transform of p(t). This illustrates the characterization of singularities in fre- 
quency by the local Fourier transform: An event at frequency location w 
will spread around u) according to the prototype function, and this across all 
time. If P{uj) has compact support [— B/2, B/2], then X(CI,t) has support 
[u- B/2, lu + B/2] x [-00,00]. 

What is important to understand is that if singularities appear together within a 
prototype function, they appear mixed in the local Fourier transform domain. This 
is unlike the continuous wavelet transform we will see in the next chapter, where 
arbitrary time resolution is possible for the scale factor going to 0. 

If the prototype function is smoother than the function to be analyzed, then 
the type of singularity (assuming there is a single one inside the prototype function) 
is determined by the decay of the Fourier transform. 

Example 11.1 (Singularity characterization of the local Fourier transform) 
Let us consider, as an illustrative example, a hat prototype function from ( |3.49a| ): 

p(t) -I 1 -®' |i|<1; 

0, otherwise, 



which has a Fourier transform ( |3.49ff ) decaying as \ut\~ 2 for large to. 

Consider a function x{t) G C 1 (continuous and with at least one continuous 
derivative, see Section [1.2.4) except for a discontinuity at t = to- If it were not 
for the discontinuity, the Fourier transform of x(t) would decay faster than |w|~ 2 
(that is, faster than |-P(w)| does). However, because of the singularity at t = to, 
\X(uj)\ decays only as |w| . 

Now the locality of the local Fourier transform comes into play. There are 
two modes, given by the regularity of the windowed function x T (t): (1) When r 
is far from to, \t— £rj| > 1, JC T (t) is continuous (but its derivative is not, because of 
the hat prototype function), and |X(fi, r)| decays as M~ 2 - (2) When r is close 
to to, I t — to I 5= lj that is, it is close to the discontinuity, x T {t) is discontinuous, 
and |X(0,r)| decays only as |w| _1 . 

This above example indicates that there is a subtle interplay between the smooth- 
ness and support of the prototype function, and the singularities or smoothness of 
the analyzed function. This is formalized in the following two results: 
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Proposition 11.3 (Singularity characterization of the local Fourier transform) 
Assume a prototype function p(t) with compact support [— T/2, T/2] and suf- 
ficient smoothness. Consider a function x(t) which is smooth except for a 
singularity of order n at t = to, that is, its nth derivative at to is a Dirac delta 
function. Then its local Fourier transform decays as 

\X(n,r)\ ~ o ( ' 



1 + |w|" 
in the region r G [to - T/2, to + T/2]. 



The proof follows by using the decay property of the Fourier transform applied to 
the windowed function and is left as Exercise 11.21 

Conversely, a sufficiently decaying local Fourier transform indicates a smooth 
function in the region of interest. 



Proposition 11.4 (Smoothness from decay of the local Fourier transform) 
Consider a sufficiently smooth prototype function p(t) of compact support 
[—T/2, T/2]. If the local Fourier transform at to decays sufficiently fast, or for 
some a and e > 0, 

"*^l * 1 + \n\ P+ ^ 

then x(t) is C p on the interval [t - T/2, t + T/2]. 



Spectrograms The standard way to display the local Fourier transform is as a 
density plot of |A"(£7,t)|. This is called the spectrogram and is very popular, for 
example, for speech and music signals. Figure 11.41 shows a standard signal with 
various modes and two spectrograms. 

As can be seen, the sinusoid is chosen, and the singularities are identified 
but not exactly localized due to the size of the prototype function. For the short 
prototype function in (Figure Til. 4( b)). various singularities are still isolated, but 
the sinusoid is not well localized. The reverse is true for the long prototype function 
(Figure [11.4( c)), where the sinusoid is well identified, but some of the singularities 
are now mixed together. This is of course the fundamental tension between time 
and frequency localization, as governed by the uncertainty principle we have seen 
in Chapter [6} 
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Figure 11.4: The spectrogram, (a) A signal with various modes, (b) The spectrogram, 
or |X(fi, t)\, with a short prototype function, (c) The spectrogram with a long prototype 
function. 



11.3 Local Fourier Frame Series 

11.3.1 Sampling Grids 

11.3.2 Frames from Sampled Local Fourier Transform 

11.4 Local Fourier Series 

11.4.1 Complex Exponential-Modulated Local Fourier Bases 
Complex-Exponential Modulation 

Balian-Low Theorem 

11.4.2 Cosine-Modulated Local Fourier Bases 

Cosine Modulation 
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Local Cosine Bases 

11.5 Computational Aspects 

11.5.1 Complex Exponential-Modulated Local Fourier Bases 

11.5.2 Cosine-Modulated Local Fourier Bases 
Chapter at a Glance 

TBD 

Historical Remarks 

TBD 

Further Reading 

TBD 

Exercises with Solutions 

11.1. TBD 

Exercises 

11.1. Ambiguity Function and Reproducing Kernel of the Local Fourier Transform 

11.2. Smoothness and Decay of the Local Fourier Transform 
Prove Propositions 11.3 and 11.41 

(Hint: Use appropriate Fourier-transform relations. Make sure the smoothness of the win- 
dow function is properly taken into account.) 
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Chapter 12 

Wavelet Bases, Frames 
and Transforms on 
Functions 



The previous chapter started with the most redundant version of the local Fourier 
expansions on functions: the local Fourier transform. We lowered its redundancy 
through sampling, leading to Fourier frames. Ultimately, we wanted to make it 
nonredundant by trying to build local Fourier bases; however, we hit a roadblock, 
the Balian-Low theorem, prohibiting such bases with reasonable joint time and 
frequency localization. While bases are possible with cosine, instead of complex- 
exponential, modulation, we can do even better. In this chapter, we will start by 
constructing wavelet bases, and then go in the direction of increasing redundancy, 
by building frames and finally the continuous wavelet transform. 

12.1 Introduction 

Iterated filter banks from Chapter \9\ pose interesting theoretical and practical ques- 
tions, the key one quite simple: what happens if we iterate the DWT to infinity? 
While we need to make the question precise by indicating how this iterative process 
takes place, when done properly, and under certain conditions on the filters used in 
the filter bank, the limit leads to a wavelet basis for the space of square-integrable 
functions, £ 2 (R). The key notion is that we take a discrete-time basis (orthonormal 
or biorthogonal) for £ 2 (Z), and derive from it a continuous-time basis for £ 2 (R). 
This connection between discrete and continuous time is reminiscent of the concepts 
and aim of Chapter \4\ including the sampling theorem. The iterative process itself 
is fascinating, but the resulting bases are even more so: they are scale invariant 
(as opposed to shift invariant) so that all basis vectors are obtained from a single 
function ip{t) through shifting and scaling. What we do in this opening section is 
go through some salient points on a simple example we have seen numerous times 
in Part II: the Haar basis. We will start from its discrete-time version seen in Chap- 
ter [7] (level 1, scale 0) and the iterated one seen in Chapter \9\ (level J, scale 2 J ) 
and build a continuous-time basis for C 2 (M) . We then mimic this process and show 
how it can lead to a wealth of different wavelet bases. We will also look into the 
Haar frame and Haar continuous wavelet transform. We follow the same roadmap 

787 
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from this section, iterated niters — wavelet series — wavelet frame series — continuous 
wavelet transform, throughout the rest of the chapter, but in a more general setting. 
As the chapter contains a fair amount of material, some of it quite technical, this 
section attempts to cover all the main concepts, and is thus rather long. The details 
in more general settings are covered throughout the rest of the chapter. 



12.1.1 Scaling Function and Wavelets from Haar Filter Bank 

To set the stage, we start with the Haar filters g and h given in Table [7T8l Chapter \7\ 
where we used their impulse responses and shifts by multiples by two as a basis for 
£ 2 (Z). This orthonormal basis was implemented using a critically-sampled two- 
channel filter bank with down- and upsampling by 2, synthesis lowpass/highpass 
filter pair g n , h n from ( 17. ID (repeated here for easy reference) 



9n = -^(Sn + S n ^) ^ G{z) = 75(1 + z _1 ) (12.1a) 

K = ^(<5„-<5„_!) ^ H(z) = -Ltl-z" 1 ), (12.1b) 



and a corresponding analysis lowpass/highpass filter pair g~ n , /i- n - 

We then used these filters and the associated two-channel filter bank as a 
building block for the Haar DWT in Chapter \9\ For example, we saw that in a 
3-level iterated Haar filter bank, the lowpass and highpass at level 3 were given by 
(9Tcl)-( ]9Tdl and plotted in Figure El 



G(3)( z ) = G{z)G{z 2 )G(z 4 ) 

= ^ 1 (1 + ^ 1 )(1 + ^ 2 )(1 + ^ 4 ) 

= ^{l + z^ + z-i + z-z + z-^ + z-s + z^ + z- 7 ), (12.2a) 

ff(3)( z ) = G{z)G{z 2 )H{z A ) 

= ^(i + ^Xi + ^Hi-^ 4 ) 

= ^(l + z- 1 +z- 2 + z- 3 -z- i -z~ 5 -z- 6 -z- 7 ). (12.2b) 



Iterated Filters We now revisit these filters and their iterations, but with a new 
angle, as we let the iteration go to infinity by associating a continuous-time function 
to the discrete-time sequence (impulse response of the iterated filter). 

We first write the expressions for the equivalent filters at the last level of a 
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J-level iterated Haar filter bank: 

.7-1 J-l 



z- 2 ' 



GW(z) = l[G(z 2 ) = 2" J / 2 n(l 
e=o e=o 

2 J -1 

= 2- J /i J2 z~ n = GV-»(z) ^(1 + ^' _1 ), (12.3) 



J-2 J-1 



H( J \z) = l[G(z 2 *)H(z 2 J ") = 2- J / 2 l[(l + z- 2e )(l-z- 



■r 



e=o £=0 



2 J_1 -1 2 J -1 

V2^ 



- J/2 ( E ^ n - E * _re ) = g^\z)Mi 



n=0 n=2 J -! 



We have seen the above expressions in (9.5c) , ( 19. 9j ) already; they construct the 
equivalent filter at the subsequent level, from the equivalent filters at the previous 
one. 

We know that, by construction, these filters are orthonormal to their shifts by 
2 J , ( 19. 6a) , ( 19. llaj ), as well as orthogonal to each other, ( 19.14a,[ ) , and their lengths 
are 

L (J) = 2 J . (12.4) 

Scaling Function and its Properties We now associate a piecewise-constant func- 
tion <^('"(£) to gii so that ip( ' (t) is of finite length and norm 1; we thus have to 
determine the width and height of the piecewise segments. Since the number of 
piecewise segments (equal to the number of nonzero coefficients of g„ ) grows ex- 
ponentially with J because of (12.4) , we choose their width as 2~ J , upper bounding 

the length of <p( J '(£) by 1. For <p( J '(t) to inherit the unit-norm property from gi , 
we choose the height of the piecewise segments as 2 l7 ' 2 <?„ . Then, the nth piece of 
ip( J ' (t) contributes 

(n+l)/2 J r(n+l)/2 J 

\^ J) (t)\ 2 dt = / 2 J {giPfdt = {gWf ( = ] 2- J 

n/2> Jn/2 J 

to (p( J >(t) (where (a) follows from (12. 3D ). Summing up the individual contributions, 

2 J -1 ,(„+l)/2 J 2 J -1 

n^)(t)n 2 = y: / w {j \t)\ 2 dt = E 2_j = ! 

n=0 J n/2 J n=0 



as in Figure 12.11 We have thus defined our piecewise-constant function as 

*<■"(*) = 2 J/2 3L J} = 1 £ < t < 2 + 1. (12.5) 
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J2„W 



y j) a) 



(b) 



Figure 12.1: Example construction of a piecewise-constant function ^ (t) from g 



,(J) 



l2„(4) 



(a) Discrete-time sequence 2 g n ■ (b) Continuous-time piecewise-constant function (p^ ' (t) 
(we plot a few isolated piecewise segments for emphasis). 




Figure 12.2: Magnitude response <f>(cj) of the scaling function (p(t). 



As ip( J '(t) is 1 on every interval of length 2~ J and g( J > has exactly 2 J nonzero 
entries (see ( 112.4) ), this function is actually 1 on the interval [0, 1) for every J, that 
is, the limit of ip^ J '(t) is the indicator function of the unit interval [0, 1] (or, a box 
function shifted to 1/2), (p(t), independently of J, 



<pW{t) 



1, < t< 1; 

0, otherwise, 



fit)- 



(12.6) 



Convergence is achieved without any problem, actually in one step! 151 | 

The function tp(t) is called the Haar scaling function. Had we started with a 
different lowpass filter g, the resulting limit, had it existed, would have lead to a 
different scaling function, a topic we will address later in the chapter. 

In the Fourier domain, <&( J >(u>), the Fourier transform of ip( '(t), will be the 
same for every J because of fl!2.6) . and thus, the Fourier transform of the scaling 
function will be the sine function in frequency (see Table 13.61 and Figure 12.2J ) : 

r -w/2 ain(h>/2) 



$(w) 



-jW2 d 



w/2 



sine (w/2) 



(12.7) 



We now turn our attention to some interesting properties of the scaling function: 
(i) Two-scale equation: The Haar scaling function cp(t) satisfies 

tp(t) = \/2{g Q ip{2t)+gx<p{2t-l)) = ip(2t) + ip(2t-l), (12.8 



151 Although we could have defined a piecewise-linear function instead of a piecewise-constant 
one, we chose not to do so as the behavior of the limit we will study does not change. 
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<p(t) 



ifi(2t) + ip(2t - 1) 



(a) 



(b) 



Figure 12.3: Two-scale equation for the Haar scaling function, (a) The scaling function 
ip(t) and (b) expressed as a linear combination of (p(2t) and <p(2t — 1). 



the so-called two-scale equation. We see that the scaling function is built out 
of two scaled versions of itself, illustrated in Figure 12.31 While in this Haar 
case, this does not come as a big surprise, it will when the scaling functions 
become more complex. To find the expression for the two-scale equation in 
the Fourier domain, we rewrite ( 112.81 ) as a convolution 



<p(t) = tp(2t) + <p(2t - 1) = V2~y2g k cp(2t-k). 



(12.9) 



fe=0 



We can then use the convolution-in-time property ( 13.64) and the scaling prop- 
erty ( 13.58a) of the Fourier transform to get 



#(u>) 



\'2 



G(e ju/2 ) *(w/2) = i(l + e-^/ 2 )e- J ' w / 4 sinc(^/4). (12.10) 



(ii) Smoothness: The Haar scaling function tp(t) is not continuous ! 152 ! This can 
also be seen from the decay of its Fourier transform $(w), which, as we know 
from ( 112.7) , is a sine function (see also Figure 112.21 ) , and thus decays slowly 
(it has, however, only two points of discontinuity and is, therefore, not all- 
together ill-behaved). 

(iii) Reproduction of polynomials: The Haar scaling function (p(t) with its integer 
shifts can reproduce constant functions. This stems from the polynomial ap- 
proximation properties of g, as in Theorem 7.51 In the next section, we will 
see how other scaling functions will be able to reproduce polynomials of degree 
N. The key will be the number of zeros at u> = ir of the lowpass filter G(e Ju; ); 
from (12.1a) . for the Haar scaling function, there is just 1. 

(iv) Orthogonality to integer shifts: The Haar scaling function ip(t) is orthogonal 
to its integer shifts, another property inherited from the underlying filter. Of 
course, in this Haar case the property is obvious, as the support of ip(t) is 
limited to the unit interval. The property will still hold for more general 
scaling functions, albeit it will not be that obvious to see. 



152 j n Proposition 112.1] we will see a sufficient condition for the limit function, if it exists, to be 
continuous (and possibly fc-times differentiable) . 
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v{t) 



<p(2t) - <p(2t - 1) 



(a) 



(b) 



Figure 12.4: Two-scale equation for the Haar wavelet, (a) The wavelet tj)(t) and (b) 
expressed as a linear combination of <p(2t) and tp(2t — 1). 



Wavelet and its Properties The scaling function we have just seen is lowpass in 
nature (if the underlying filter g is lowpass in nature). Similarly, we can construct 
a wavelet (or, simply wavelet) that will be bandpass in nature (if the underlying 
filter h is highpass in nature). 

We thus associate a piecewise-constant function ip( >(t) to h n in such a way 
that t[)( )(t) is of finite length and of norm 1; we use the same arguments as before 
to determine the width and height of the piecewise segments, leading to 



^ J \t) = 2 J / 2 h^ 



n 

2 1 



< t < 



(12.11) 



.7-1 oJ-1 



+ 



Like cp( ■'(£), the function t/>( '(t) is again the same for every J since the length of 
/i( J ) is exactly 2 J . It is 1 for n = 0, 1, ... , 2 J ~ l - 1, and is -1 for n = 2 
1, . . . , 2 — 1. Thus, it comes as no surprise that the limit is 

C 1, < t < 1/2; 
4>{t) = I -1, 1/2 < t < 1; 
[ 0, otherwise, 

called the Haar wavelet, or, Haar wavelet \ 153 \ (see Figure [12.4( a)). 
Similarly to &(uj), in the Fourier-domain, 

,,,.,:: (w/4) = -±H(eJ u >/ 2 )<S>{uj/2). 



(12.12) 



*(w) 



e -J«/S) e -W4,i 



(12.13) 



We now turn our attention to some interesting properties of the Haar wavelet: 

(i) Two-scale equation: We can see from \P(w) its highpass nature as it is at 
u> = (because H(l) = 0). Using the convolution-in-time property ( |3.64[ ) and 
the scaling property ( 13. 58a) of the Fourier transform, we see that the above 
can be written in the time domain as (see Figure [12.4( b)) 



ip(t) = V2(h k ,ip(2t-k))k = tp(2t) - <p(2t - 1). 



(12.14) 



In other words, the wavelet is built out of the scaling function at a different 
scale and its shift, its own two-scale equation, but involving scaled versions 



153 Depending on the initial discrete-time filters, the resulting limits, when they exist, lead to 
different wavelets. 
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of the scaling function instead of itself. The last expression in ( 112. 13[ ) is the 
Fourier-domain version of the two-scale equation. 

(ii) Smoothness: Since the wavelet is a linear combination of the scaled scaling 
function and its shift, its smoothness is inherited from the scaling function; in 
other words, like the scaling function, it is not continuous, having 3 points of 
discontinuity. 

(iii) Zero-moment property: We have seen that the Haar scaling function can re- 
produce constant functions. The Haar wavelet has a complementary property, 
called zero-moment property. To see that, 

/oo 
mat = 0, 
-oo 

that is, the inner product between the wavelet and a constant function will 
be zero. In other words, the wavelet annihilates constant functions while the 
scaling function reproduces them. 
(iv) Orthogonality to integer shifts: Finally, like the scaling function, the wavelet 
is orthogonal with respect to integer shifts. Again, this is trivial to see for the 
Haar wavelet as it is supported on the unit interval only. 
(v) Orthogonality of the scaling and wavelets: It is also trivial to see that the scal- 
ing function and the wavelet are orthogonal to each other. All these properties 
are setting the stage for us to build a basis based on these functions. 

12.1.2 Haar Wavelet Series 

Thus far, we have constructed two functions, the scaling function ip(t) and the 
wavelet ij)(t), by iterating the Haar filter bank. That filter bank implements a 
discrete-time Haar basis, what about in continuous time? What we can say is that 
this scaling function and the wavelet, together with their integer shifts, {<p(t — 
k), ip(t — k)}k£z do constitute a basis, for the space of piecewise-constant functions 
on intervals of half-integer length or more (see Figure f!2.5( a)-(c)). We can see that 
as follows. Assume we are given a function x^ 1 '^) that equals a for < t < 1/2; 
b for 1/2 < t < 1; and otherwise. Then, 

( n / % a + b , . a — b , . , 

= (x^Ht), ip(t)) <p(t) + (x^\t), m) m 

= x i0) {t) + d (Q) {t). (12.15) 

Had the function x^~ l '{t) been nonzero on any other interval, we could have used 
the integer shifts of (p(t) and ip(t). 

Clearly, this process scales by 2. In other words, the scaled scaling function and 
the wavelet, together with their shifts by multiples of 1/2, {\/2(p(2t — k), V2tp(2t — 
k)}k<£Z do constitute a basis, for the space of piecewise-constant functions on in- 
tervals of quarter-integer length or more (see Figure 12.6( a)— (c)). Assume, for 
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x^it) 



x m (t) 



-irt 



d m (t) 



-irt 



(a) 



(b) 



(c) 



Figure 12.5: Haar series decomposition of (a) ar (i), a function constant on half- 
integer intervals, using {<p(t — k),ip(t — k)} k€Z , into (b) a^°'(£) and (c) d'°'(t). 



x^Jt) 



x^\t) 



L 

(a) 



-it 



d^\t) 




(b) 




Figure 12.6: Haar series decomposition of (a) x*- (t), a function constant on quarter- 
integer intervals, using {V2^(2t - k), ^ip(2t - k)} k€Z , into (b) x (_1) (i) and (c) d (_1) (f). 



example, we are now given a function x^ 2 '{t) that equals c for < t < 1/4; d for 
1/4 < t < 1/2; and otherwise. Then, 



x(-V(t) = -±-<p(2t) + —1>(?t) 

= (x ( ~ 2) , V2<p(2t)) y/2<p(2t) + (x { ~ 2) {t), y/2i/>(2t)) V2ip(2t) 
= x^(t) + d^\t) ( = } x^(t)+d^(t)+d^(t), 



where (a) follows from Q12.15| ). In other words, we could have also decomposed 
x^ 2 \t) using y{t - k), i/>(t - k) y/i, i>(2t - fc)}fcez 

Continuing this argument, to represent piecewise-constant functions on inter- 
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vals of length 2~ e , we need the following basis: 

Scale Functions 



scaling function and its shifts ip(t — k) 



wavelet and its shifts ip(t — k) 



-1 wavelet and its shifts 2 1 / 2 i/;(2i — k) 



-(i-1) wavelet and its shifts 2"( £ - 1 )/ 2 V'(2-^- 1 't - k) J 

Definition of the Haar Wavelet Series Prom what we have seen, if we want to 
represent shorter and shorter constant pieces, we need to keep on adding wavelets 
with decreasing scale together with the scaling function at the coarsest scale. We 
may imagine, and we will formalize this in a moment, that if we let this process 
go to infinity, the scaling function will eventually become superfluous, and it does. 
This previous discussion leads to the Haar orthonormal set and a truly surprising 
result dating back to Haar in 1910, that this orthonormal system is in fact a basis 
for £ 2 (K). This result is the Haar continuous-time counterpart of Theorem 19.21 
which states that the discrete-time wavelet h n and its shifts and scales (equivalent 
iterated filters) form an orthonormal basis for the space of finite-energy sequences, 
f 2 (Z). The general result will be given by Theorem 112.61 

For compactness, we start by renaming our basis functions as: 

^ k (t) = 2^' 2 ^(2-h-k) = -I^(i_^), (12.16a) 

W , k (t) = 2- l ' 2 ^{2- l t-k) = J^^ftzllY (12.16b) 



A few of the wavelets are given in Figure 12.71 Since we will show that the Haar 
wavelets form an orthonormal basis, we can define the Haar wavelet series to be 

/>oo 

&JP = (x, i/it,k) = / x(t)ipi, k (t)dt, £,keZ, (12.17a) 



J k 

j —00 

and the inverse Haar wavelet series 



x ^ = X)X>*W*)- (12.17b) 
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0o,o(i) 



II 21 



(a) 
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Figure 12.7: Example Haar basis functions, (a) The prototype function ip(t) — ipo,o{t)\ 
(b) 0-i, i(i); (c) 0-2, i(i). (Repeated Figure [021) 



We call /3^' the wavelet coefficients, and denote such a wavelet series pair by 



x(t) 



ws 



/£ 



M 



Properties of the Haar Wavelet Series 

(i) Linearity: The Haar wavelet series operator is a linear operator, 
(ii) ParsevaVs Equality: The Haar wavelet series operator is a unitary operator 



and thus preserves the Euclidean norm (see ( 1 1 . 5 1 [ ) ) : 



MP 



\x(t)\ 2 alt 



fez fee 



Ml 2 



(12.18) 



(iii) Zero-Moment Property: We have seen earlier that, while the Haar scaling 
function with its integer shifts can reproduce constant functions, the Haar 
wavelet with its integer shifts annihilates them. As the wavelet series uses 
wavelets as its basis functions, it inherits that property; it annihilates constant 
functions. In the remainder of the chapter, we will see this to be true for 
higher-order polynomials with different wavelets. 

(iv) Characterization of Singularities: One of the powerful properties of wavelet- 
like representations is that they can characterize the type and position of 
singularities via the behavior of wavelet coefficients. Assume, for example, 
that we want to characterize the step singularity present in the Heaviside 
function ( J3.8J ) with the step at location to ■ We compute the wavelet coefficient 



(3[ £) to get 




/•CO 

J — oo 




f 2 e / 2 k - 2^/%, 
" { _ 2 */ 2 (fc + 1) + 2-*/%, 


2 e k< t <2 l {k+\)\ 

2 e {k + ±)< t <2 e {k + l). 



Because the Haar wavelets at a fixed scale do not overlap, there exists exactly 
one nonzero wavelet coefficient per scale, the one that straddles the discontinu- 
ity. Therefore, as £ — > — oo, the wavelet series zooms towards the singularity, 
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x(t) 



5 10 15 20 25 30 



Figure 12.8: Behavior of Haar wavelet coefficients across scales. We plot f3 k for 
£ = 1, 2, . . . , 5, where k is dependent on the scale. Because the wavelet is Haar, there 
is exactly one nonzero coefficient per scale, the one corresponding to the wavelet that 
straddles the discontinuity. 



shown in Figure 12.81 We see that as £ decreases from 5 to 1, the single 
nonzero wavelet coefficients gets closer and closer to the discontinuity. 



Multiresolution Analysis In the discrete-time Chapters PTITOl we have often en- 
countered coarse/detail approximating spaces V and W. We now use the same 
intuition and start from similar spaces to build the Haar wavelet series in reverse. 
What we will see is how the iterative construction and the two-scale equations are 
the manifestations of a fundamental embedding property explicit in the multireso- 
lution analysis of Mallat and Meyer. 

We call V*- ' the space of piecewise-constant functions over unit intervals, that 
is, we say that x{t) 6 V^°\ if and only if x(t) is constant for t € [k, k + 1), and x(t) 
is of finite £? norm. Another way to phrase the above is to note that 



V (0) = span ({</?(*- fc)} fcGZ ) = span({(/? ,fc}fcez). 



(12.19) 



where <p(t) is the Haar scaling function ( 112.61 ) . and, since (ip(t—k), f{t—m)) = Sk- m > 
this scaling function and its integer translates form an orthonormal basis for V^°' 
(see Figure [12 .91 ). Thus, x(t) from V^ ' can be written as a linear combination 



x(t) 



E4°V*-fc), 

fcez 



where a k is simply the value of x(t) on the interval [k, k + 1). Since 



\\x(t)\f 



„(0)||2 
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x^(t)ev m 



ip(t - k) 



-i t 



-ft 



(a) 



(b) 



Figure 12.9: Haar multiresolution spaces and basis functions, (a) A function x^°'(t) £ 
V m . (b) Basis functions for V m . 




Figure 12.10: Multiresolution spaces. 



or, Parseval's equality for this orthonormal basis. We now introduce a scaled 
version of V^ ' called y' ', the space of piecewise-constant functions over intervals 
of size 2\ that is [2 e k, 2 e (k + 1)), £ e Z. Then, 

yw = span {{(pi,k}kG.i) , 

for fGZ. For £ > 0, V^> is a stretched version of y(°), and for £ < 0, V& is a 
compressed version of V^ ' (both by 2 ). Moreover, the fact that functions constant 
over [2 e k, 2 e (k + 1)) are also constant over [2 m k, 2 m (k + 1)), £ > m, leads to the 
inclusion property (see Figure 112.101 ) , 

yW q y( m ) 



£ > m. 



(12.20) 



We can use this to derive the two-scale equation (12. 81 ). by noting that because of 
y(0) q y\- 1 ) ; (p(£) can be expanded in the basis for V^-~ K Graphically, we show 
the spaces V^°\ y' -1 ) , and their basis functions in Figures 112.91 and Figure [12.11i 
the two-scale equation was shown in Figure 12.31 

What about the detail spaces? Take a function x^~ l '(t) in V^ 1 ' but not in 
y(°) ; such a function is constant over half-integer intervals but not so over integer 
intervals (see Figure [12.11( a)). Decompose it as a sum of its projections onto V^°' 
and W^°\ the latter the orthogonal complement of V^ ' in V^ 1 ' (see Figure \1 2. 121 ). 



,(-i) 



(0 



y(o) 



(x(-V)(t)+P wm (x^)(t). 



(12.21) 
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c (-i)( i)eF (- 1 ) 



V2(p(2t - k) 



-tt 



-i-t 



(a) 



(b) 



Figure 12.11: Haar multiresolution spaces and basis functions, (a) A function a. (t) £ 
V ( ~ 1} . (b) Basis functions for V { ~ 1] . 



14/(0) 



d<°>(t) = 




(-l)( t ) e y(-D 



^y(o) 

a:( )(t) = P y (o)(a ; (- 1 ))(t) 

Figure 12.12: A function from V^ 1 ' as the sum of its projections onto V^ and W* . 



We first find the projection of cc' 1 '(^) onto V^ ) as 






!,-_'- 



(«) 



(M 



^(.t (_1) (t), (p(2t - 2fc) + p(2i - 2k - l)) t ^(i - fc), 



feez 



£i [ s (-i)( i)+J (-i)(Hj)] ^-fc), 



(12.22) 



where (a) follows from the two-scale equation ( J12.8D ; and (b) from evaluating the 
inner product between x 1 - -1 ' (t) and the basis functions for V^~ x ' . In other words, 
a;' ' (t) is simply the average of a;' -1 ' (t) over two successive intervals. This is the best 
least-squares approximation of x^~ l '(t) by a function in V^°' (see Exercise 1 12. 5 [ ). 

We now find the projection of x^ 1 ' onto W^ ' . Subtract the projection x^ 
from x' -1 ' and call the difference dS°\ Since x(°> is an orthogonal projection, S°> 
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Figure 12.13: Haar decomposition of a function (a) ar _1 '(t) £ l/' -1 ' into a projection 
(b) ar°'(i) e V^ ' (average over two successive intervals) and (c) S°'(t) € VF' ' (difference 
over two successive intervals), (d)-(f) Appropriate basis functions. 



is orthogonal to V (0) (see Figure H2T2J) . Using ( 112.22) leads to 

k <t < k + 



a;(- 1 )(fc)-.T(- 1 )(fc + i) 
x^-^{k)-x < --^(k + ^) 



k+ I < t < fc+1, 



= EtS^ 1 '-^!] V-(t-fc) = IX^C*-*)- (12-23) 

fe fc 

We have thus informally shown that the space V^ 1 ' can be decomposed as 

v (-i) = v (o)^ w (o) (12.24) 

(see an example in Figure [12.13) . We also derived bases for these spaces, scaling 
functions and their shifts for V^~ l > and V^ ' , and wavelets and their shifts for W^ ' . 
This process can be further iterated on Vq (see Figure [12.10) . 

12.1.3 Haar Frame Series 

The Haar wavelet series we just saw is an elegant representation, and completely 
nonredundant. As we have seen in Chapters \W\ and [TT1 at times we can benefit 
from relaxing this constraint, and allowing some redundancy in the system. Our 
aim would then be to build frames. There are many ways in which we could do 
that. For example, by adding to the Haar wavelet basis wavelets at points halfway 
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iP{t-k) 



■tj)(t - fc/2) 

n'T 






(a) 



(b) 



Figure 12.14: A few Haar wavelets at scale £ — for the (a) Haar wavelet series 
(nonredundant) and (b) Haar frame series (redundant). Clearly, there are twice as many 
wavelets in (b), making it a redundant expansion with the redundancy factor 2. 



in between the existing ones, we would have twice as many wavelets, leading to a 
redundant series representation with a redundancy factor of 2, a simple example we 
will use to illustrate Haar frame series. 

Definition of the Haar Frame Series We now relax the constraint of critical 
sampling, but still retain the series expansion. That is, we assume that expansion 
coefficients are 



/3f = (x,1>i, k ) 
where ipik{t) is now given by 



x(t) i/>t,k(t) dt, £, fceZ, 



^t,k{t) 



-1/2 i i -. 



t - b k) 



i> 



t — aibok 



(12.25) 



(12.26) 



with clq > 1 and &o > 0. With ao = 2 and bo = 1, we get back our nonredundant 
Haar wavelet series. What we have allowed ourselves to do here is to choose dif- 
ferent scale factors from 2, as well as different coverage of the time axis by shifted 
wavelets at a fixed scale. For example, keep the scale factor the same, ao = 2, but 
allow overlap of half width between Haar wavelets, that is, choose bo = 1/2. Fig- 
ure 12.14( b) shows how many wavelets then populate the time axis at a fixed scale 
(example for £ = 0), compared to the wavelet series (part (a) of the same figure). 



Properties of the Haar Frame Series Such a Haar frame series satisfies similar 
properties as the Haar wavelet series: it is linear, it is able to characterize singulari- 
ties, it inherits the zero-moment property. One property though, Parseval's equality, 
bears further scrutiny. Let us express the energy of the expansion coefficients as: 



EEI4 



fc/2 I 



£Ei/tf 



2 IMP 



fezfcgz 



^ ££!/&// ( = } 

fcez iei. fcez 

In other words, this frame series behaves like two 
orthonormal bases glued together; it is then not surprising that the energy in the 
expansion coefficients is twice that of the input function, making this transform a 
tight frame. 



where (a) follows from ( 112. 
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lit 



Figure 12.15: The prototype wavelet in the Haar continuous wavelet transform. 

12.1.4 Haar Continuous Wavelet Transform 

We finally relax all the constraints and discuss the most redundant version of a 
wavelet expansion, where the Haar wavelet ( 112. 12[ ) is now shifted to be centered at 
t = (see Figure H2T5) : 

C 1, -l/2<t<0; 
i/j(t) = I -1, 0< t< 1/2; (12.27) 

[ 0, otherwise. 

Then, instead of a = af), we allow all positive real numbers, a G K + . Similarly, 
instead of shifts b = b^k, we allow all real numbers, b £ K: 

^ a , b {t) = ^r-—], aeR+, beR, (12.28) 

with Tp(t) the Haar wavelet. The scaled and shifted Haar wavelet is then centered at 
t = b and scaled by a factor a. All the wavelets are again of unit norm, HVv&WII = 1- 
For a = 2 e and b = 2 £ k, we get the nonredundant wavelet basis as in TBD. 

Definition of the Haar Continuous Wavelet Transform We then define the Haar 
continuous wavelet transform to be (an example is given in Figure 112.161 ) : 

/•OO 

X{a,b) = (x,ip a ,b) = / x(t)ip a ,b(t)dt, aeR + , beR, (12.29a) 



with if) a ,b(i) from ( 112.28) , with the inverse Haar continuous wavelet transform 

1 f f ,,,,>, , s dbda 



*(*) = 7T \ \ *M)<M*) — a". (12.29b) 

^ip Jaem.+ Jbem a 

where the equality holds in L 2 sense. To denote such a pair, we write: 

,.s cwt „, ,s 
x(t) < — > X(a,b). 
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a a a 

LL1 



(a) 




Figure 12.16: (a) The Haar wavelet transform of an example function x(t) (hat function), 
(b) Illustration of the shift-in-time property for x(t — 2). (c) Illustration of the scaling- in- 
time property for x(t/2). 



Properties of the Haar Continuous Wavelet Transform 

(i) Linearity: The Haar continuous wavelet transform operator is a linear opera- 
tor. 
(ii) Shift in time: A shift in time by to results in (see Figure [12.16( b)) 



x(t — to) < — ► X(a, 6 — to). 



(iii) Scaling in time: Scaling in time by a results in (see Figure [12. 16( c)) 

CWT 



x(at) 






(12.30) 



(12.31) 



(iv) Parseval's equality: Parseval's equality holds for the Haar continuous wavelet 
transform; we omit it here and revisit it in the general context in ( 112.1121 ). 

(v) Redundancy: Just like for the local Fourier transform, the continuous wavelet 
transform maps a function of one variable into a function of two variables. It 
is thus highly redundant, and this redundancy is expressed by the reproducing 
kernel: 

K(a 0l b ,a,b) = (t(Ja ,b , ipa,b), (12.32) 



(vi) 



a four-dimensional function. Figure 12.171 shows the reproducing kernel of the 
Haar wavelet, namely K(l, 0, a, b); note that the reproducing kernel is zero at 
all dyadic scale points as the wavelets are then orthogonal to each other. 
Characterization of Singularities: This is one of the most important properties 
of the continuous wavelet transform, since, by looking at its behavior, we can 
infer the type and position of singularities occurring in the function. 

For example, assume we are given a Dirac delta function at location to, 
x(t) = 5(t — to). At scale a, only those Haar wavelets whose support straddles 
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Figure 12.17: The Haar wavelet transform of the Haar wavelet 
also the reproducing kernel, K(l, 0, a, b), of the Haar wavelet. 



12.121). This is 



to will produce nonzero coefficients. Because of the form of the Haar wavelet, 
the nonzero coefficients will extend over a region of size 2 a around to- As a — > 
— oo, these coefficients focus arbitrarily closely on the singularity. Moreover, 
these coefficients grow at a specific rate, another way to identify the type of a 
singularity. We will go into more details on this later in the chapter. 

As a simple example, take x(t) to be the Heaviside function ( 13. 81 ) with the 
step at location to- We want to see how the Haar wavelet ( 112.271 ) isolates and 
characterizes the step singularity. To do that, we will need two things: (1) 
The primitive of the wavelet, defined as 



9{t) 



■0(t) dr 



1/2 

0, 



1*1 < V2; 

otherwise, 



^12.33) 



that is, a hat function ( |3.49aj ) on the interval |i| < 1/2. Note that the primitive 
of the scaled and normalized wavelet a _1 ' 2 V-'(*/a) is y/aO(t/a), or a factor a 
larger due to integration. (2) We also need the derivative of x(t), which exists 
only in a generalized sense (using distributions) and can be shown to be a 
Dirac delta function at to, x'(t) = S(t — to)- 

Now, the continuous wavelet transform of the Heaviside function follows 
as 



X(a,b) 



(a) 
(J 
0) 



-=i\) ( ) x(t)dt 

\/a \ a 



al)\ l — )x(t) 



y/ae[ - — - )x'{t)dt 



Va~9 



Va6[ t — )S(t-t )dt 



t -b 



(12.34) 
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Figure 12.18: The Haar wavelet transform of a piecewise-polynomial function x(i) 
as in (112.351). 



where (a) follows from integration by parts; (b) from 8 being of compact sup- 
port; and (c) from the shifting property of the Dirac delta function in Table [331 
Thus, as a — ► 0, the continuous wavelet transform zooms towards the singu- 
larity and scales as a 1 ' 2 , with a shape given by the primitive of the wavelet 
0(t); thus, we may expect a hat- like region of influence (see Figure 12.181 at 
the step discontinuity, t = 4, for illustration). 

This discussion focused on the behavior of the Haar wavelet transform 
around a point of singularity; what about smooth regions? Take x(t) to be 



t 



x(t) 



1, 1 < t < 2; 

2, 2 < t < 4; 
0, otherwise, 



(12.35) 



and the Haar wavelet ( 11 2. 2 7[ ). The function x(t) has three singularities, dis- 
continuities at t = 1, 2, 3. The wavelet has 1 zero moment, so it will have a 
zero inner product inside the interval [2,3], where x(t) is constant (see Fig- 
ure [12718). What happens in the interval [1,2], where x(t) is linear? 

Calculate the continuous wavelet transform for some shift b € [1, 2] for a 
sufficiently small so that the support of the shifted wavelet [b— a/2, 6 + a/2] £ 
[1,2]: 



X(a,b) 




tdt 



b-a/2 



b+a/2 



tdt 



1 



3/2 



(12.36) 



Thus, the lack of a second zero moment (which would have produced a zero 
inner product) produces a residual of order a 3 ' 2 as a 



0. 

To study qualitatively the overall behavior, there will be two cones of 
influence at singular points 2 and 3, with an order a 1 ' 2 behavior as in ( 112.34) . 
a constant of order a 3 ' 2 in the (1,2) interval as in ( 112. 36| ) (which spills over 
into the (2, 3) interval), and zero elsewhere, shown in Figure [12.181 
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Chapter Outline 

We now follow the path set through this simple Haar example, and follow with more 
general developments. We start in Section 12.2 with iterated filter banks building 
scaling functions and wavelets as limits of iterated filters. We study issues of con- 
vergence and smoothness of resulting functions. In Section 12.31 we then define 
and look into the properties of wavelet series: localization, zero moments and de- 
cay of wavelet coefficients, before considering the characterization of singularities 
by the decay of the associated wavelet series coefficients. We study multiresolu- 
tion analysis, revisiting the wavelet construction from an axiomatic point of view. 
In Section 12.41 we relax the constraints of nonredundancy to construct wavelet 
frames, midway between the nonredundant wavelet series and a completely redun- 
dant wavelet transform. We follow in Section 12.5 with the continuous wavelet 



transform. Section 112.61 is devoted to computational issues, in particular to Mal- 
lat's algorithm, which allows us to compute wavelet coefficients with an initial 
continuous-time projection followed by a discrete-time, filter-bank algorithm. 

Notation used in this chapter: In most of this chapter, we consider real-valued 
wavelets only and thus the domain for the scale factor a is M + ; the extension to 
complex wavelets requires simply a G I, a / 0. D 



12.2 Scaling Function and Wavelets from Orthogonal 
Filter Banks 

In the previous section, we set the stage for this section by examining basic prop- 
erties of iterated filter banks with Haar filters. The results in this section should 
thus not come as a surprise, as they generalize those for Haar filter banks. 

We start with an orthogonal filer bank with filters g and h whose properties 
were summarized in Table [7l9l Chapter [7] where we used these filters and their shifts 
by multiples by two as the basis for £ 2 (Z) . This orthonormal basis was implemented 
using a critically-sampled two-channel filter bank with down- and upsampling by 
2, an orthogonal synthesis lowpass/highpass filter pair <?„, h n and a corresponding 
analysis lowpass/highpass filter pair <?_„, /i_ n - We then used these filters and the 
associated two-channel filter bank as building blocks for a DWT in Chapter [91 For 
example, we saw that in a 3-level iterated Haar filter bank, the lowpass and highpass 
at level 3 were given by ( |9.1cD -( l9.1dj ) and plotted in Figure 19.41 we repeated the 
last-level filters in ( ]12.2| ). Another example, a 3-level iterated filter bank with 
Daubechies filters, was given in Example 19.11 

12.2.1 Iterated Filters 

As for the Haar case, we come back to filters and their iterations, and associate 
a continuous-time function to the discrete-time sequence representing the impulse 
response of the iterated filter. 

We assume a length-!/ orthogonal lowpass/highpass filter pair (g n ,h n ), and 
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write the equivalent niters at the last level of a J-level iterated filter bank: 

J-i 
G (J) (z) = \\G{z 2t ) = G (J - 1) ( 2 )G(z 2 ' 7 " 1 ), (12.37a) 

e=o 

.1-2 

H^\z) = HG(/)fl(/ _1 ) = G^ J ' 1 \z)H(z 2J ~ 1 ). (12.37b) 

£=0 

We know that, by construction, these filters are orthonormal to their shifts by 2 J , 
(9.6a) , ( 19.11a) , as well as orthogonal to each other, (9.14a) . 

The equivalent filters g^ J \ hS ' have norm 1 and length L^ J \ which can be 
upper bounded by (see ( 19. 5b) ) 

l(J) < (L- 1)2 J . (12.38) 

12.2.2 Scaling Function and its Properties 

We now associate a piecewise-constant function ^'(t) to g„ so that <p( J >{t) is 
of finite length and norm 1. Since the number of piecewise segments (equal to the 

number of nonzero coefficients of <?n ) grows exponentially with J (see (12.38) ), we 
choose their width as 2~ J , upper bounding the length of tp( J >(t) by (L — 1): 

support (V J) (i)) C [0,L-1], (12.39) 

where support( ) stands for the interval of the real line where the function is different 

from zero. For <p( J >(t) to inherit the unit-norm property from g„ , we choose the 

height of the piecewise segments as 2 ' 2 gn ■ Then, the nth piece of the y?' '{€) 
contributes 

(n+l)/2 J r (n+l)/2 J 

\p iJ) (t)\ 2 dt = / 2 J (g^fdt = (g^f 

n/2 J Jn/2 J 



to (p( '(t). Summing up the individual contributions, 

L { ^1 n{n+X)/2 J L : J_- i 

n=0 •' n / 2J n=u 

We have thus defined the piecewise-constant function as 



L< J >-1 ,(n+l)/2 J i (J) -l 

\<P w ®\ a = £ / w {J \t)\ 2 dt = £ ^ J) ) 2 = i- 

„_n Jn/2 J „_n 



^) {t) = 2 J/* g u JL < t < 2 + 1, 

L( J '-1 

= £ gi J) 2 J/2 M2 J t-n), (12.40) 

n=0 
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Figure 12.19: Iterated filter 2 J ^ 2 g n and associated piecewise-constant function tp( J \t) 
based on a 4-tap Daubechies lowpass filter ( |9.10D at level (a)-(b) J = 1; (c)-(d) J — 2; 
(e)-(f) J — 3; and (g)-(h) J — 4. Note that we have rescaled the equivalent filters' 
impulse responses as well as plotted them at different discrete intervals to highlight the 
correspondences with their piecewise-constant functions. 



where cfh (t) is the Haar scaling function (box function) from ( 112.6J ) . We verified that 
the above iterated function is supported on a finite interval and has unit norm j 154 ! 
In Figure 12.191 we show a few iterations of a 4-tap filter from Example 19.1 
and its associated piecewise-constant function. The piecewise-constant function 
(p( J > has geometrically decreasing piecewise segments and a support contained in 



154 We could have defined a piecewise-linear function instead of a piecewise-constant one, but it 
does not change the behavior of the limit we will study. 
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the interval [0, 3]. From the figure it is clear that the smoothness of <p{ J '{t) depends 
on the smoothness of gfi ■ If the latter tends, as J increases, to a sequence with little 
local variation, then the piecewise-constant approximation will tend to a smooth 
function as well, as the piecewise segments become finer and finer. On the contrary, 
if gn has too much variation as J — > 00, the sequence of functions tp( '(t) might 
not have a limit as J — > 00. This leads to the following necessary condition for the 
filter <7„, the proof of which is given in Solved Exercise 1 12. It 



Proposition 12.1 (Necessity of a zero at it) For the lim./^oo <p^ J \t) to 
exist, it is necessary for G(e JW ) to have a zero at u> = it. 



As a direct corollary of this result, the necessity of a zero at u> = it translates 
also to the necessity of 

G(e*")| w=0 = y/2, (12.41) 

because of (7.13) . We are now ready to define the limit function: 



Definition 12.2 (Scaling function) We call the scaling function ip(t) the 
limit, when it exists, of: 

tp(t) = lim ip {J) (t), (12.42) 

J — >oo 



Scaling Function in the Fourier Domain We now find the Fourier transform of 
tp( '(t), denoted by <&( J '(u)). The functions ip^ J '(t) is a linear combination of box 
functions, each of width 1/2 J and height 2 J ' 2 , where the unit box function is equal 
to the Haar scaling function (12.6) , with the Fourier transform $7,(0;) as in (12.7) . 
Using the scaling-in-time property of the Fourier transform (3.58a) , the transform 
of a box function on the interval [0, 1/2 J ) of height 2 J ' 2 is 

$(%) = 2 -"*e-W +1 g^Mg^l, ( i 2 .43) 

Shifting the nth box to start at t = n/2 J multiplies its Fourier transform by 
g-j^n/2 _ p u tting it all together, we find 



<!> 



( J )M = ^(lu) Y, e- JUn/2J 9i J) = ^ h J) (Lo)G^(e^ 2J ) 



n=0 



n J) HlI G ( e ) - *fc(w)n G ( e )• ( 12 - 44 ) 



e=o £=1 



where (a) follows from the definition of the DTFT (2.78a) : (b) from (1231 ); and (c) 
from reversing the order of the factors in the product. 
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In the sequel, we will be interested in what happens in the limit, when J — > 00. 
For any finite ui, the effect of the interpolation function $ h (ui) becomes negligible 
as J — > 00. Indeed in (12.43) , both terms dependent on u> tend to 1 as J — > 00 and 
only the factor 2~ J ' 2 remains. So, in (12.44) , the key term is the product, which 
becomes an infinite product, which we now define: 

Definition 12.3 (Fourier transform of the infinite product) We call 
Q(lu) the limit, if it exists, of the infinite product: 

00 
$H = lim $( J V) = r[V 1/2G ( ejW2f )- (12A5) 

J— >oo -^ ■*- 



The corollary to Proposition [I2JI (12.41) is now clear; if G(l) > V%, $(0) 
would grow unbounded, and if G(l) < v2, 3>(0) would be zero, contradicting the 
fact that $(w) is the limit of lowpass filters and hence a lowpass function. 

A more difficult question is to understand when the limits of the time-domain 
iteration (p (J \t) (12.40) and the Fourier-domain iteration $( J )(w) ( 112.44) form the 
Fourier-transform pair. We will show in Example 12.1 that this is a nontrivial 
question. As the exact conditions are technical and beyond the scope of our text, 
we concentrate on those cases when the limits in Definitions 12.2 and 12.3 are well 
defined and form a Fourier-transform pair, that is, when 

<p(t) ^ *(w), 

We now look into the behavior of the infinite product. If $(w) decays suffi- 
ciently fast in U, the scaling function (p(t) will be smooth. How this can be done 
while maintaining other desirable properties (such as compact support and orthog- 
onality) is the key result for designing wavelet bases from iterated filter banks. 

Example 12.1 (Fourier transform of the infinite product) Togainin- 
tuition, we now look into examples of filters and their associated infinite products. 

(i) Daubechies filter with two zeros at uj = it: We continue with our example 
of the Daubechies lowpass filter from ( 19.10) with its associated piecewise- 
constant function in Figure [12.191 In the Fourier-domain product ( 112.44) , 
the terms are periodic with periods 4tt, 8tt, . . ., 2 J 2ir, since G(e : ' w ) is 2ir- 
periodic (see Figure [L2.20( a)-(c)). We show the product in part (d) of the 
figure. The terms are oscillating depending on their periodicity, but the 
product decays rather nicely. We will study this decay in detail shortly, 
(ii) Length-A filter designed using lowpass approximation method: Consider the 
orthogonal filter designed using the window method in Example 7.21 This 
filter does not have a zero at u> = n, since 

G(e JUJ )\ ra 0.389. 

Its iteration is shown in Figure 112.211 with noticeable high-frequency oscil- 
lations, prohibiting convergence of the iterated function (p^(t). 
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IG(e W2 )| 



JG(e^/4)| 



IG(e W8 )| 




J* CJ V)I 




(d) 

Figure 12.20: Factors (a) |G(e ja,/2 )|, (b) \G(e j " /4 )\, and (c) |G(e ic " /8 )| that appear in 
(d) the Fourier-domain product &( j '(lu). 




Figure 12.21: Iteration of a filter without a zero at uj — n. The high-frequency oscilla- 
tions prohibit the convergence of the iterated function ip^ J '{t). 



(iii) Stretched Haar filter: Instead of the standard Haar filter, consider: 



.'/ 



l 

•J2 







v" 



ZT 



G(z) 



Ul + z-\ 



v/2 



It is clearly an orthogonal lowpass filter and has one zero at uj = it. How- 
ever, unlike the Haar filter, its iteration is highly unsmooth. Consider the 
equivalent filter after J stages of iteration: 



ft 



(.J) 



2 J / 



772" [0 o o 1 



10 1 



The piecewise-constant function ip( J '(t) inherits this lack of smoothness, 
and does not converge pointwise to a proper limit, as shown graphically in 
Figure [12.221 Considering the frequency domain and the infinite product, 
it turns out that C 2 convergence fails as well (see Exercise 12.21 ) . 

The examples above show that iterated filters and their associated graphical 
functions behave quite differently. The Haar case we saw in the previous section 
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? w {t) 



!P i2) (t) 



«,<■"(«) 



(a) 



(b) 



(c) 



Figure 12.22: Iteration of the stretched Haar filter with impulse response 
[1/V2 1/V2]. (a) ^\t). (b) ^(t). (c) ip^(t). 



was trivial, the 4-tap filters showed a smooth behavior, and the stretched Haar 
filter pointed out potential convergence problems. 

In the sequel, we concentrate on orthonormal filters with N > 1 zeros at 
uj = it, or 



G(e ] 



1 + e- 



A ; 



R(en, 



(12.46) 



with i?(e' a; ) _ = v2 for the limit to exist. We assume (1) pointwise convergence 
of the iterated function (p( J '(t) to ip(t), (2) pointwise convergence of the iterated 
Fourier-domain function ^ j '(oj) to $>(llj), and finally (3) that ip(t) and #(w) are a 
Fourier-transform pair. In other words, we avoid all convergence issues and concen- 
trate on the well-behaved cases exclusively. 



Two-Scale Equation We have seen in Section [12 .1.1 1 that the Haar scaling function 
satisfies a two-scale equation 12.81 This is true in general, except that more terms 
will be involved in the summation. To show this, we start with the Fourier-domain 



limit of the infinite product (| 12.45ft : 

00 00 

*(«) = J]2- 1 / 2 G(e^ 2 ') ( => 2- 1 / 2 G(e^ 2 ) [J 2- 1/2 G(e-^ 2 ' 



W 



(<■■) 



L-l 



2 -i/2 G ( e W2) $( w /2) ^ 2- 1 / 2 Y / 9ne~ JUJn/2 $(u/2), 



n=0 



(<0 



L-l 



^ 9n [2- 1 /2 e -W2,J )(cL , /2) ] j 



(12.47) 



where in (a) we took one factor out, 2 _1 ' 2 G(e- 7 '* ; ' 2 ); in (b) we recognize the infinite 
product as $(u/2); (c) follows from the definition of the DTFT ( j2.78a.ft ; and in 
(d) we just rearranged the terms. Then, using the scaling-in-time property of the 

Fourier transform ( |3.58a|) , $(w/2) < > 2ip(2t), and the shift-in-time property (|3.56| ). 
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Figure 12.23: Two-scale equation for the Daubechies scaling function, (a) The scaling 
function (p(t) and (b) expressed as a linear combination of ip(2t — n). 



e : > ujn ' 2 X(uj) < — > x(t — n/2), we get the two-scale equation: 



L-l 



?{t) = V2^g nl p(2t-n), 



(12.48) 



shown in Figure [12.231 for the Daubechies 4-tap filter from Examples |12.1| and |12.2[ 



Smoothness As seen earlier, the key is to understand the infinite product ( 112.45) 
which becomes, using ( 112.46) . 



*( W ) = n 2 " 1 



-1/2 



1 + e^' 2 ' 



N 



R{e^/ 2 ') 



i=i 



n 

\i=i 



'l + e"^/ 2 ' 



N 



H 2 ~ i/2r ( ( 



Ju/2* 



'12.49) 



A(u>) 



?(u,) 



Our goal is to see if $(w) has a sufficiently fast decay for large lu. We know from 
Chapter \3\ ( 13. 79a) . that if |3?(w)| decays faster than l/|w| for large \u)\, then (pit) is 
bounded and continuous. Consider first the product 






-i»l*\ ® r -W2 sin(h;/2) 



e=i 



w/2 ' 



where in (a) we extracted the Haar filter ( 112. la) , and (b) follows from ( 112.45) as well 
as the Haar case ( 112.71 ). The decay of this Fourier transform is of order 0(l/|w|), and 
thus, A(w) in ( 112.49) decays as 0(l/\u\ N ). @ So, as long as \B(u)\ does not grow 
faster than |aj| JV_1_e , e > 0, the product ( 112.49) will decay fast enough to satisfy 
( 13.79a) , leading to a continuous scaling function tp(t). We formalize this discussion 
in the following proposition, the proof of which is given in Solved Exercise 112.21 



155 In time domain, it is the convolution of N box functions, or a B spline of order N — 1 (see 
Chapter [5] 
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Proposit 


ION 

if 


: 12.4 (Smoothness of the scaling 
B = sup \R{e ju )\ < 2 N - 

w<E[0,2tt] 


function) 

-1/2 

1 


With R(e? u ) as 
(12.50) 


in (112.46]), 


then, as J 
function <p 


(*) 


00, the iterated function ^ J '{t) converg 
with the Fourier transform 

00 
*(w) = n 2~ l ' 2 G(e^l 2t 
e=i 


;es pointwise 

)■ 


to 


a continuous 



Condition ( 112.50) is sufficient, but not necessary: many niters fail the test but 
still lead to continuous limits (and more sophisticated tests can be used). 
If we strengthened the bound to 

B < 2 N - k - 1/2 k e N, 

then (p(t) would be continuous and fc-times differentiable (see Exercise TBD). 

Example 12.2 (Smoothness of the scaling function) We now test the con- 
tinuity condition ( ]12.50[ ) on the two filters we have used most often. 
The Haar filter 

/ 1 4. p-jw \ _ 

G(en = Ml + e-n = [—- /2~ , 



has N = 1 zero at uo = it and R{z) = \f2. Thus, B = \/2, which does not meet 
the inequality in ( 112.50) . According to Proposition 112.41 tp(t) may or may not 
be continuous (and we know it is not). 
The Daubechies filter ( 19T0) 

G(en = (^P 1 ) ^(1 + V3 + (1-V3)e-^) , 

has N = 2 zero at uo = it. The supremum of |i?(e J "| is attained at u> = it, 
B = sup \R{e ju> )\ = \/6 < 2 3 / 2 , 

and thus, the scaling function <p{t) must be continuous. 

Reproduction of Polynomials We have seen in Chapter \5\ that splines of order 
N and their shifts can reproduce polynomials of degree up to N. Given that the 
scaling functions based on a filter having N zeros at u> = it contain a spline part of 
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Figure 12.24: An example of the reproduction of polynomials by the scaling function 
and its shifts. The scaling function ip(t) is based on the Daubechies filter with two zeros 
at 7r, ( 19.10) , and reproduces the linear function x(t) — t (on an interval because of the 
finite number of scaling functions used). 



order (N — 1), linear combinations of {<p(t — n)} ne % can reproduce polynomials of 
degree (N — 1). We illustrate this property in Figure [12 .241 where the Daubechies 
filter with two zeros at n, (9.10) , reproduces the linear function x(t) = t. (We give 
the proof of this property later in the chapter.) 

Orthogonality to Integer Shifts As we have seen in the Haar case already, the 
scaling function is orthogonal to its integer shifts, a property inherited from the 
underlying filter: 

(tp(t), tp(t-n)) t = S n . (12.51) 

Since tp(t) is defined through a limit and the inner product is continuous in both 
arguments, orthogonality ( [12.511 ) follows from the orthogonality of cp( '(t) and its 
integer shifts for any J: 

(<p (J) (t), <pW(t-n)) t = *„, JGZ+, (12.52) 

which follows, in turn, from the same property for the iterated filter g n in ( 19.6a) : 

( => {Y,\i^/\ h {2h - n), "^giih^M^it - k) - m)) t 



(6) 



n=0 

L (-n_ 1L (j) 



m=0 



/oo 
2V(2Vn) B (2V2- 
.... .... -oo 



k — m)) dt 



71 — 771—0 



( - ] V a {J) a (J) - (a {J) a {J) ) (±) 5i 

re=0 

where (a) follows from (12.40) ; in (b) we took the sums and filter coefficients out 
of the inner product; (c) from the orthogonality of the Haar scaling functions; and 
(d) from the orthogonality of the filters themselves, (9.6a) . 

The orthogonality ( 112.511 ) at scale has counterparts at other scales: 

{<p(2?t),<p(2?t-n)) t = 2-%, (12.53) 

easily verified by changing the integration variable. 
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12.2.3 Wavelet Function and its Properties 

The scaling function we have just seen is lowpass in nature (if the underlying filter 
g is lowpass in nature). Similarly to what we have done for the scaling function, 
we can construct a wavelet function (or, simply wavelet) that will be bandpass in 
nature (if the underlying filter h is highpass in nature). 

We thus associate a piecewise-constant function ip( J '(t) to /i„ , the impulse 
response of (12.37b) , in such a way that ip^ J '(t) is of finite length and of norm 
1; we use the same arguments as before to determine the width and height of the 
piecewise segments, leading to 

^>(t) = 2->/H« JL < t < 2±i. (12.54) 

Unlike (p( J '(t), our new object of interest ip( J '(t) is a bandpass function. In partic- 
ular, because H(e^)\ = 0, its Fourier transform fy(u)) satisfies 

*(w)L=o = °- 

Again, we are interested in what happens when J — > 00. Clearly, this involves 

an infinite product, but it is the same infinite product we studied for the convergence 

of (p( J >{t) towards <f(t). In short, we assume this question to be settled. The 

development parallels the one for the scaling function, with the important twist of 

consistently replacing the lowpass filter G(z 2 ) by the highpass filter H{z 2 ). 

We do not repeat the details, but rather indicate the main points. Equation (12.44) 

becomes 

J 

tf( J )( w ) = ^\uj)H{e^ /2 )\{G(e^f 2t ). (12.55) 

1=2 

Similarly to the scaling function, we define the wavelet as the limit of tp '(t) 
or ^( j )(oj), where we now assume that both are well defined and form a Fourier- 
transform pair. 



Definition 


12.5 (Wavelet) Assuming the limit to exist, 


we 


define the wavelet 


in time and frequency domains to be 












m = 


lim 

J — >oo 


^ J \t), 




(12.56a) 




$(w) = 


lim 

J — >oo 


^ {J \oj). 




(12.56b) 



From ( 112.55J ) and using the steps leading to ( 112.45J ), we can write 

00 
*(w) = 2- 1/2 H{e^ /2 )\\2- 1/2 G{e 3u/2t ). (12.57) 
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20 40 60 80 100 120 140 160 



Figure 12.25: Wavelet based on the Daubechies highpass filter (12.60) . (a) Wavelet 
ip(t) and (b) the two-scale equation for the wavelet. 



Two-Scale Equation Similarly to ( 112. 47) , we can rewrite ( 112.571 ) as 

*(w) = 2- 1 / 2 H(e juj/2 )${u>/2). (12.58) 



Taking the inverse Fourier transform, we get a relation similar to ( 112.48) , namely 



L-l 



iP(t) = V2^ h n ip(2t 



(12.59) 



n=0 



the two-scale equation for the wavelet. From the support of ip(t) in ( J12.39) , it also 
follows that ip(t) has the same support on [0, L — 1]. To illustrate the two-scale 
relation and also show a wavelet, consider the following example. 

Example 12.3 (Wavelet and the two-scale equation) Take the Daubechies 
lowpass filter ( 19.10) and construct its highpass via ( 17.241 ). It has a double zero 
at u> = 0, and is given by: 



H(z) 



l r 



4\/2 L 



(v/3 - 1) + (3 - \/3> _1 - (3 + V3)z- 2 + (1 + \/3)z~ 3 ] 



(12.60) 



Figure [12.251 shows the wavelet ip(t) and the two-scale equation. 

Smoothness Since the wavelet is a finite linear combination of scaling functions 
and their shifts as in (12.59) . the smoothness is inherited from the scaling function, 
as illustrated in Figure [12.25( a). 

Zero-Moment Property We assumed that the lowpass filter G(e?") had TV zeros 
(JV > 1) at u = n. Using (724) and applying it to (12.46) , we get 



H(z) 



J(L-l)u> 



N 



R(e 



j(ui+w) 



(12.61) 
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It has therefore N zeros at u> = 0. These N zeros carry over directly to ^>(uj) 
because of ( 112.58) and ^(uj)\ uj=0 = 1. Because of this 



d n X(u 



du n 



(12.62) 



We can now use the moment property of the Fourier transform, ( ]3.63a[ ), to find the 
Fourier-transform pair of the above equation, leading to 

t n ^(t)dt = n = 0, 1, ..., N-l. (12.63) 

In other words, ii p(t) is a polynomial function of degree (N — 1), its inner product 
with the wavelet at any shift and/or scale will be 0: 

(p(t),i>(at-b)) t = for all a,beR. (12.64) 

Remembering that (p(t) is able to reproduce polynomials up to degree (N — 1), it 
is a good role split for the two functions: wavelets annihilate polynomial functions 
while scaling functions reproduce them. 

Orthogonality to Integer Shifts In our quest towards building orthonormal bases 
of wavelets, we will need that the wavelet is orthogonal to its integer shifts. The 
derivation is analogous to that for the scaling function; we thus skip it here, and 
instead just summarize this and other orthogonality conditions: 

{ip{2h), iP{2h - n)) t = 2- e S n , (12.65a) 

(ip(2 t t),il)(2 t t-n))t = 0. (12.65b) 

12.2.4 Scaling Function and Wavelets from Biorthogonal Filter 
Banks 

As we have already seen with filter banks, not all cases of interest are necessarily 
orthogonal. In Chapter \7\ we designed biorthogonal filter banks to obtain symmet- 
ric/antisymmetric FIR filters. Similarly, with wavelets, except for the Haar case, 
there exist no orthonormal and compactly-supported wavelet bases that are sym- 
metric/antisymmetric. Since symmetry is often a desirable feature, we need to relax 
orthonormality. We thus set the stage here for the biorthogonal wavelet series by 
briefly going through the necessary concepts. 

To start, we assume a quadruple (h n , g n , h n , g n ) of biorthogonal impulse re- 
sponses satisfying the four biorthogonality relations ( |7.64aj )-( j7.64d] ). We further 
require that both lowpass filters have at least one zero at lu = n, and more if 
possible: 

. » . . . jv 



G(en = — R(en, G{e"") = — R{e>»). (12.66) 
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Since the highpass niters are related to the lowpass ones by ( J7.75J ), the highpass 
filters H{e 3u) ) and H(e JU> ) will have N and N zeros at u> = 0, respectively. In the 
biorthogonal case, unlike in the orthonormal one, there is no implicit normalization, 
so we will assume that 



G(e*")l n = G(en 



= V2, 

j=0 



which can be enforced by normalizing H(e' ]u) ) and H(e JU ) accordingly. 
Analogously to the orthogonal case, the iterated filters are given by 



e=o 1=0 

and we define scaling functions in the Fourier domain as 

00 00 

$H = ]^2- 1 / 2 G(e^ /2f ), $(w) = \\2- 1 / 2 G{e jul l 2e ). (12.67) 

In the sequel, we will concentrate on well-behaved cases only, that is, when the 
infinite products are well defined. Also, the iterated time-domain functions corre- 
sponding to G^ J '(z) and & J '(z) have well-defined limits ip(t) and tp(t), respectively, 
related to ( 112. 671 ) by Fourier transform. 

The two-scale relations follow similarly to the orthogonal case: 

*(w) = 2- 1 / 2 G(e^/ 2 )$(w/2), (12.68a) 

L-l 

cp(t) = V2^2g n ip(2t-n), (12.68b) 



as well as 



n=0 



fc(w) = 2- 1 / 2 G(e^/ 2 )$(w/2), (12.69a) 

L-l 

tp(t) = V2~y2g n cp(2t-n)a. (12.69b) 



Example 12.4 (Scaling function and wavelets from linear S-splines) 
Choose as the lowpass filter 

1- ■ ( 1 4- e~ ju] \ 2 1 

G(e>") = y/2e>" {-±- j = (<•»<- + 2 + e"^), 

which has a double zero at u> = and satisfies the normalization G(e J ")| = 
\/2. Then, using ( 112.67) , we compute $(w) to be ( ]3.49f) , that is, the Fourier 
transform of the hat function (3.49a) . or linear _B-spline. This is because G(z) 
is (up to a normalization and shift) the convolution of the Haar filter with itself. 
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V" V*" 



Figure 12.26: The hat function and its dual, (a) <p(t) from the iteration of —7= [1,2, 1]. 
(b) tp(t) from the iteration of ^[-1,2,6,2,-1]. 



Thus, the limit of the iterated filter is the convolution of the box function with 
itself, the result being shifted to be centered at the origin. 

We now search for a biorthogonal scaling function tp(t) by finding first a 
(nonunique) biorthogonal lowpass filter G(z) satisfying ( )7.66f ) . Besides the trivial 
solution G(z) = 1, the following is a solution as well: 



G(z) 



= (l + z)(l + z- 1 )(-z + 4-z- 1 ) 



1 



: (-z 2 + 2z + 6 + 2z~ 1 -z~ 2 ), 



4v/2 " 4^ 

obtained as one possible factorization of C(z) from Example 17.41 The resulting 
dual scaling function tp(t) looks quite irregular (see Figure 12.260 . We could, 
instead, look for a G(z) with more zeros at u> = tt to obtain a smoother dual 
scaling function. For example, choose 



G(z) 



1 



64^ 



(l + z )2 (1 + z -l)2 (3z 2 



18 z + 38 - 18Z' 1 + 3z~ 2 ) 



leading to quite a different ip(t) (see Figure [12. 271 ). 
Choosing the highpass niters as in ( 17.75,1 ), 

H(z) = zGi-z" 1 ), H{z) = z- l G(-z), 

with only a minimal shift, since the lowpass filters are centered around the origin 
and symmetric, we get all four functions as in Figure 12.271 



12.3 Wavelet Series 

So far, we have considered only a single scale with the two functions ip(t) and 4>(t). 
Yet, as for the Haar case in Section 12.11 multiple scales are already lurking in the 
background through the two-scale equations ( 112. 481 ), ( [12. 591 ). And just like in the 
DWT in Chapter \9\ the real action appears when all scales are considered as we 
have already seen with the Haar wavelet series. 
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Figure 12.27: Biorthogonal linear spline basis, (a) The linear S-spline is the hat function 
tp(t). (b) The linear B-spline wavelet ip(t). (c) The dual scaling function ip(t). (d) The 
dual wavelet ip(t). 



i>o,o(t) 



V>-l,2(t) 



V>2,l(*) 



(a) 



(b) 



(c) 



Figure 12.28: Example wavelets, (a) The prototype wavelet ip(t) = ipa,o{t); (b) ^-i,a(t); 
(c) fh,i(t)- 



12.3.1 Definition of the Wavelet Series 

We thus recall 



^,fc(*) 



2-"*M-H-k) = ^(^ 



W , k (t) - 2- l ' 2 ^{2-h-k) = ^V> 



(12.70a) 
(12.70b) 



for I, k G Z, with the understanding that the basic scaling function ip(t) and the 
wavelet ip(t) are no longer Haar, but can be more general. As before, for £ = 0, we 
have the usual scaling function and wavelet and their integer shifts; for £ > 0, the 
functions are stretched by a power of 2, and the shifts are proportionally increased; 
and for £ < 0, the functions are compressed by a power of 2, with appropriately 
reduced shifts. Both the scaling function and the wavelet are of unit norm, and 
that at all scales (a few examples are given in Figure H2.28J ) . 
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Two-Scale Equations at Nonconsecutive Scales Since we want to deal with mul- 
tiple scales (not just two), we extend the two-scale equations for (p(t) and ip(t) across 
arbitrary scales that are powers of 2: 

$(w) = 2- 1 l 2 G{eJ u / 2 )<S>{oj/2), 

= 2- 1 G(e Jw/2 )G(e Jw/4 )$(a>/4), 
(£} 2 ~ 1 G (2 V W4 )*(^/4), 



= 2- fe / 2 G (fe) (e^ /2fc )$(^/2 fe ), (12.71a) 

L-l 

<p(t) = 2 fe / 2 ^ 5 W^(2 fc i-n), (12.71b) 

n=0 

for k e Z + , where both (a) and (b) follow from the two-scale equation in the Fourier 
domain, ( 112.47) ; (c) from the expression for the equivalent filter, (12.37a) ; and (d) 
by repeatedly applying the same (see Exercise |12.3) . The last expression is obtained 
by applying the inverse DTFT to ( 112.71a) . 

Using an analogous derivation for the wavelet, we get 

#H = 2~ fc / 2 il( fc V w / 2fc )$(^/2 fe ), (12.72a) 

L-l 

iP(t) = 2 fe / 2 5>i fe V(2 fe t-n), (12.72b) 

for k = 2, 3, . . .. The attractiveness of the above expressions lies in their ability 
to express any (pg^(t), ipi,k(t), in terms of a linear combination of an appropriately 
scaled <p(t), where the linear combination is given by the coefficients of an equivalent 
filter g„ or h n . We are now ready for the main result of this chapter: 



Theorem 12.6 (Orthonormal basis for £ 2 (R)) The continuous-time 

wavelet ip(t) satisfying ( 112.59) and its shifts and scales, 

ftM*)} = {-^=v(^-^)}, e,kez, (12.73) 

form an orthonormal basis for the space of square-integrable functions, C (M). 



Proof. To prove the theorem, we must prove that |(i)| {tpe,k(t)}i, feez is an orthonormal 
set and |(ii)| it is complete. The good news is that most of the hard work has already 
been done while studying the DWT in Theorem 19. 2| Chapter |9) 

(i) We have already shown that the wavelets and their shifts are orthonormal at a 
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single scale, (j!2.65a| ), and need to show the same across scales: 

^ 2^(2^ £ f&<P-i,* h+n (r),4>-UT))r, 



where (a) follows from assuming (without loss of generality) £ — m + i, i > 0, and 
change of variable t = 2 £ r; (b) from two-scale equation for the wavelet ([12.72b]) ; 
and (c) from the linearity of the inner product as well as orthogonality of the 
wavelet and scaling function (12.65a) . 
(ii) The proof of completeness is more involved, and thus, we show it only for the 
Haar case. Further Reading gives pointers to texts with the full proof. Consider a 
unit-norm function x(t) such that x(t) — for t < with finite length at most 2 
for some J G Z j 156 We approximate x(t) by a piecewise-constant approximation 
at scale £, (where f <C J), or 



r 2 l (k + l) 

x {l \t) = 2~ l / x{r)dr, 2 l k < t < 2 e (k + l), 

J2'k 
(a) 



2_j ( / x(t) tpe,k{r) dr J tpt,h(t), 

C J ^2(x, <pi,k)<pt,k{t) - 5ZafcV/,*(*), (12.74a) 

feez feez 

where (a) follows from (12.16b) ; (b) from the definition of the inner product; and 
in (c) we introduced Oil — { x , fi,k)- 

Because of the finite-length assumption of x(t), the sequence a k is also of 
finite length (of order 2 ). Since x(t) is of norm 1 and the approximation 
in ( 112.74a) is a projection, \\a k || < 1. Thus, we can apply Theorem 19.21 and 
represent the sequence a k by discrete Haar wavelets only 






Since the expression ( |12.74a) is the piecewise-constant interpolation of the se- 
quence a k , together with proper scaling and normalization, by linearity, we can 
apply this interpolation to the discrete Haar wavelets used to represent a, , which 



156 Both of these restrictions are inconsequential; the former because a general function can be 
decomposed into a function nonzero on t > and t < 0, the latter because the fraction of the 
energy of x(t) outside of the interval under consideration can be made arbitrarily small by making 
J arbitrarily large. 
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leads to a continuous-time Haar wavelet representation of x"'(t): 

= E Z)^' ) V'/+i,n3'(t)- 

This last statement follows from h 1 * 1 ' being of length 2\ Thus, for a fixed n and 
h Y2k€Z. k-2* Vi,k(i) will equal the Haar wavelet of length 2 J 2 £ = 2 l+e at shift 
n2*. Again by Theorem 19.21 this representation is exact. 

What remains to be shown is that x"'(t) can be made arbitrarily close, in 
£ norm, to x(t). This is achieved by letting £ — > —00 and using the fact that 
piecewise-constant functions are dense in £ 2 (R) | 157 | we get 

lim \\x{t) - x (e) (t)\\ = 0. 

i — ► — 00 

TBD: Might be expanded. 

The proof once more shows the intimate relation between the DWT from Chapter \9\ 
and the wavelet series from this chapter. 

Definition We can now formally define the wavelet series: 



Definition 12.7 (Wavelet series) The wavelet series of a 


function x(t) is a 


function of £, k G Z given by 






/•CO 

(3^ = (x,i/>t,k) = / x(t)i>i, k (t)dt, i, 

J — OO 


k ez. 


(12.75a) 


with tpi^kit) the prototype wavelet. The inverse wavelet 


series i 


s given by 


*(*) = ££/#W*)- 




(12.75b) 


£ez feez 






In the above, (3^' are the wavelet coefficients. 






To denote such a wavelet series pair, we write: 






X (t) & /e 







We derived such bases already and we will see other constructions when we 
talk about multiresolution analysis. 



157 That j S] arl y £2 f unc ti on can be approximated arbitrarily closely by a piecewise-constant 
function over intervals that tend to 0. This is a standard result but technical, and thus we just 
use it without proof. 
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12.3.2 Properties of the Wavelet Series 

We now consider some of the properties of the wavelet series. Many follow from the 
properties of the wavelet (Section 112. 2. 3) or of the DWT (Chapter [9j , and thus our 
treatment will be brief. 

Linearity The wavelet series operator is a linear operator, or, 

ax(t)+by{t) <^> a f3 { k e) + b (3 { k e) . (12.76) 

Shift in Time A shift in time by 2 m n, m, n £ Z, results in 

x{t -2 m n) ^ /?f 2m „, £<m. (12.77) 

This is a restrictive condition as it holds only for scales smaller than m. In other 

words, only a function x(t) that has a scale-limited expansion, that is, it can be 

written as 

m 

£=-00 fcez 
will possess the shift-in-time property for all (of its existing) scales. This is a 
counterpart to the shift-in-time property of the DWT, ( 19.17) , and the fact that the 
DWT is periodically shift variant. 

Scaling in Time Scaling in time by 2~ m , m G Z, results in 

x (2~ m t) ^ 2 m ' 2 (3 k l - m) . (12.78) 

Parseval's Equality The wavelet series operator is a unitary operator and thus 
preserves the Euclidean norm (see ( 11.51) ): 

/oo 

\ x (i)\ 2 dt = ££i/3fi 2 . ( 12 - 79 ) 

"°° fezfcez 

Time- Frequency Localization Assume that the wavelet ?/>(f) is centered around 
t = in time and u> = Sn/A in frequency (that is, it is a bandpass filter with the 
support of approximately [tt/2, it]). Then, from (12.70a} ), ipt t o(t) is centered around 
u> = 2 _£ (37r/4) in frequency (see Figure [12.29) . 

With our assumption of g being a causal FIR filter of length L, the support in 
time of the wavelets is easy to characterize. Since the support of ip(t) is [0, L — 1], 

support(^, fe (t)) C [2 e k,2 e {k + L- 1)). (12.80) 

Because of the FIR assumption, the frequency localization is less precise (no com- 
pact support in frequency), but the center frequency is around 2 _f (37r/4) and the 
passband is mostly in an octave band, 

support^ (w)) ~ [2-^,2-V]. (12.81) 
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CO 



2rt 



<p(0 



*-l,7« 




Figure 12.29: Time-frequency localization of wavelet basis functions. Three wavelets 
are highlighted: at scale I = 0, ip(t); at scale £ — —1, a higher-frequency wavelet ip-ij(t); 
and at scale t = 2, a lower-frequency wavelet tp2,i(t)- These are centered along the dyadic 
sampling grid [2 l k 2~ l (Zw/A)}, for £,keZ. 



Characterization of Singularities As we have seen with the example of Haar 
wavelet series (see Figure 12.8) , one of the powerful features of the wavelet se- 
ries is its ability to characterize both the position and type of singularities present 
in a function. 

Consider a function with the simplest singularity, a Dirac delta function at a 
location to, that is, x(t) = 5(t — to). At scale £, only wavelets having their support 
( 112. 80) straddling in will produce nonzero coefficients, 



fl> 



^ for \t /2 l \ -L <k< [t /2 e \. 



(12.82) 



Thus, there are L nonzero coefficients at each scale. These coefficients correspond 
to a region of size 2 £ (L — 1) around to, or, as £ — > — oo, they focus arbitrarily closely 
on the singularity. What about the size of the coefficients at scale P. The inner 
product of the wavelet with a Dirac delta function simply picks out a value of the 
wavelet. Because of the scaling factor 2~^' 2 in (12.70a| , the nonzero coefficients 
will be of order 

|/3f | ~ 0{2- 1 ' 2 ) (12.83) 

for the range of k in ( 112. 82| ). That is, as £ - 



-00, the nonzero wavelet series 



coefficients zoom in onto the discontinuity, and they grow at a specific rate given 
by ( 112.83) . An example for the Haar wavelet was shown in Figure 12.81 

Generalizing the Dirac delta function singularity, a function is said to have an 
nth- order singularity at tg when its nth-order derivative has a Dirac delta function 
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component at to- The scaling ( 112. 83j ) for a zeroth-order singularity is an example 
of the following result: 



Proposition 12.8 (Scaling behavior around singularities) Given a 
wavelet ip(t) with N zero moments, around a singularity of order n, < n < N , 

(£) 

the wavelet series coefficients (3 k behave as 




^ 


~ 0(2^(™" 1 /2)) ; £->_oo. (12.84) 



Proof. We have analyzed n — earlier. We now give a proof for n — 1; generalizing to 
n > 1 is the topic of Exercise 12.31 

Assume the wavelet has at least one zero moment, N > 1. A function with a 
first-order singularity at to looks like a Heaviside function (378) locally (at to). We can 
reduce the analysis to n — by considering the derivative X (t), which is a Dirac delta 
function at to- We use the fact that ip has at least one zero moment and is of finite 
support. Then, as in ( | 12.34) , using integration by parts, 

/OO /*00 

ip{t)x{t)dt = - / 0(t)x'(t)dt 



= -(x'(t), e(t)) t 

(x(t), *l>i, k (t))t = -(x'(t), 8 e , k (t)) t , 

where 9{t) — J_ i/)(t) dr is the primitive of ip(t), 9e,k(t) is the primitive of ipe^it), and 
x'(t) is the derivative of x(t). Because ip(t) has at least one zero at u> = and is of 
finite support, its primitive is well defined and also of finite support. The key is now 
the scaling behavior of 9e t k(t) with respect to 9(t). Evaluating 

r2~ e t-k 

-l/2 nl , -l_ ,s , _ „£/2 



6t,k(t) = / 2- e/2 tP(2- e T-k)dr = 2 e/2 ip(t')dt' = 2 l/2 6{2- l t-k), 

J — OO J — OO 

we see that this scaling is given by 2^ 2 . Therefore, the wavelet coefficients scale as 

l^'l = \{x(t), A.k(t))A = \-{x'(t),e e At)) t \ 

~ 2 e/2 \(5(t-t ), 9(2-H-k))t\ ~ 0(2 e/2 ), (12.85) 

at fine scales and close to to. 

Zero-Moment Property When the lowpass filter g has N zeros at u> = it, we 
verified that ip(t) has N zero moments ( 1 12.631 ) . This property carries over to all 
scaled versions of tp(t), and thus, for any polynomial function p(t) of degree smaller 
than N, 

$> = (p(t), AAt))t = 0. 

This allows us to prove the following result: 
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Proposition 12.9 (Decay of wavelet series coefficients for x e C N ) 
For a function x(t) with N continuous and bounded derivations, that is, x G C , 
the wavelet series coefficients decay as 




W 


< a2 mN 


for some constant a > and m — > — 00. 



Proof. Consider the Taylor series expansion of x(t) around some point to- Since x(t) 
has iV continuous derivatives, 

,. , x ,.,, *'(*o) , x"(t ) 2 s^fa) n _ 

z(£o + e) = z(t ) + — jp- e + 2 , e +•■•+ , JV _ jTj e +-Rw(e), 

= p(i) + i?jv(e), 

where 

1^(6)1 < ^ sup |l W (t)|, 
JV! t <t<t + e I I 

and we view it as a polynomial p(t) of degree (iV — 1) and a remainder .Rjv(e). Because 
of the zero-moment property of the wavelet, 

|/?f| = \(x(t), i> m ,n(t))\ = l<p(*)+-Rjv(e),^ m ,n(t)>| = |(^(e), V»,»(*))l, 

that is, the inner product with the polynomial term is zero, and only the remainder 
matters. To minimize the upper bound on |(ifjv(e), ip m ,n)\, we want to close to the 
center of the wavelet. Since the spacing of the sampling grid at scale £ is 2 e , we see 
that e is at most 2 e and thus \(Rjsr(e), 4>i,k)\ has an upper bound of order 2 eN . 

A stronger result, in which N is replaced by N + 1/2, follows from Proposition 1 12 . 171 
in the context of the continuous wavelet transform. 

12.3.3 Multiresolution Analysis 

We have already introduced the concept of multiresolution analysis with the Haar 
scaling function and wavelet in Section 12.11 As opposed to having a discrete-time 
filter and constructing a continuous-time basis from it, multiresolution analysis does 
the opposite: it starts from the multiresolution spaces to build the wavelet series. 
For example, we saw that the continuous-time wavelet basis generated a partition 
of £ 2 (K) into a sequence of nested spaces 

. . . c y (2) c v {1) c y (0) c v { - l) c v { - 2) c . . . , 

and that these spaces were all scaled copies of each other, that is, V^' is V^ ' scaled 
by 2 e . We will turn the question around and ask: assuming we have a sequence of 
nested and scaled spaces as above, does it generate a discrete-time filter bank? The 
answer is yes; the framework is multiresolution analysis we have seen in the Haar 
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case. We present it shortly in its more general form, starting with the axiomatic 
definition and followed by examples. 

The embedded spaces above are very natural for piecewise-polynomial func- 
tions over uniform intervals of length 2 . For example, the Haar case leads to 
piecewise-constant functions. The next higher order is for piecewise-linear func- 
tions, and so on. The natural bases for such spaces are i?-splines we discussed in 
Chapter [5j these are not orthonormal bases, requiring the use of orthogonalization 
methods. 

Axioms of Multiresolution Analysis We now summarize the fundamental charac- 
teristics of the spaces and basis functions seen in the Haar case. These are also the 
axioms of multiresolution analysis. 

(i) Embedding: We work with a sequence of embedded spaces 

. . . C V^ C y« C V® C V^-V C V^-V C . . . , (12.86a) 

where V^> is the space of piecewise-constant functions over [2 k, 2 £ (k + l))fcgz 
with finite C 2 norm. We call the V^s successive approximation spaces, since 
as I — y — 00, we get finer and finer approximations. 
(ii) Upward Completeness: Since piecewise-constant functions over arbitrarily- 
short intervals are dense in C 2 (see Footnote 157) , 



lim V w = I I yM = C 2 (R). (12.86b) 

£^-00 ^tez y ' K ' 

(iii) Downward Completeness: As i — > 00, we get coarser and coarser approxima- 
tions. Given a function x(t) € £ 2 (R), its projection onto V^' tends to zero as 
£ — y 00, since we lose all the details. More formally, 

P|yW = {0}. (12.86c) 

(iv) Scale Invariance: The spaces V^> are just scaled versions of each other, 

x(t)&V (e) «• x(2 m t) e V {e ' m} . (12.86d) 

(v) Shift Invariance: Because x(t) is a piecewise-constant function over intervals 
[2 e k, 2 l (k + 1)), it is invariant to shifts by multiples of 2 l , 

x{t) e V w O x(t - 2 e k) e V w . (12.86e) 

(vi) Existence of a Basis: There exists <p(t) € V^°' such that 

{<p(t-k)} keZ (12.86f) 

is a basis for V^ ' . 

The above six characteristics, which naturally generalize the Haar multiresolution 
analysis, are the defining characteristics of a broad class of wavelet systems. 
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Definition 12.10 (Multiresolution analysis) A sequence {V^} m ez of 
subspaces of C 2 (M) satisfying fll2.86a[ )-( jl2.86f| ) is called a multiresolution anal- 
ysis. The spaces V^' are called the successive approximation spaces, while the 
spaces W^\ defined as the orthogonal complements of V^' in V^ e ~ x ' , that is, 

V (i-D = V W (BW W ) (12.87) 

are called the successive detail spaces. 

Definition For simplicity, we will assume the basis in ( |12.86f| ) to be orthonormal; 
we cover the general case in Solved Exercise 12.31 

The two-scale equation ( 112.481 ) follows naturally from the scale-invariance ax- 
iom ((iv)). What can we say about the coefficients g n l Evaluate 

6k = (<f(t), <p(t-k))t = 2 22 /J 9ng m (<p(2t - re), tp(2t -2k- m)) t 

X] 9n9n- 



(c) 



-2k, 



where (a) is true by assumption; in (b) we substituted the two-scale equation 12.481 
for both <p(t) and <p(t — k); and (c) follows from (ip(2t — n), tp(2t — 2k — m)) t = 
except for n = 2k + m when it is 1/2. We thus conclude that the sequence g n 
corresponds to an orthogonal filter ( 17. 13) . Assuming that the Fourier transform 
$(w) of <p{t) is continuous and satisfie s 158 ! 

I*(0)| = 1, 
it follows from the two-scale equation in the Fourier domain that 

|G(1)| = V2, 

making g n a lowpass sequence. Assume it to be of finite length L and derive the 
equivalent highpass filter using ( 17.24) . Defining the wavelet as in ( 112.591 ) . we have: 



Proposition 12 


11 The wavelet 


givei 








1 bv (112.59ft satisfies 








m), 


w 


- n)) t = 6 n , 












W), 


<p(t 


-n)) t = 0, 






andVF^ = 


= span 


(m- 


-n)} nez ) 


is the orthogonal complement of V^ ' 


inV( 


-1) 








y(-i 


_ 


v (o)^ w (o)_ 


(12 


88) 



158 If ip(t) is integrable, this follows from upward completeness ([12.86b)) for example. 
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We do not prove the proposition but rather just discuss the outline of a proof. 
The orthogonality relations follow from the orthogonality of the sequences g n and 
h n by using the two-scale equations ( ]12.48j ) and ( 112. 59j ). That {ip(t — n)}gz is an 
orthonormal basis for W*- ' requires checking completeness and is more technical. 
By construction, and in parallel to ( ]12.86d[ ), W^-' are just scaled versions of each 
other, 

x(t)eW w «• x{2 m t) eW ie - m) . (12.89) 

Putting all the pieces above together, we have: 



Theorem 12.12 (Wavelet basis for £ 2 (R)) Given a multiresolution analysis 
of £ 2 (R) from Definition H2TT01 the family 

1 ft-2 l k\ 



with ip(t) as in ( 112. 591 ), is an orthonormal basis for C (R). 

Proof. Scaling (1238] ) using ( 112.86d[ ), we get that V W = V {e+1) W (e+1) . Iterating 
it n times leads to 

V W = W {1+1) ®W {l+2) ®...®W {t+n) ®V {t+n) . 

As n — > 00 and because of ( 112.86c[) , we ge t 159 

00 

■i=e+i 
and Anally, letting I — » — 00 and because of ( ]12.86b|) , we obtain 

£ 2 {R) = 0W m . (12.90) 

Since {ip(t — fc)}fcez is an orthonormal basis for W^°\ by scaling, {ipi,k(t)}k£2. is an 
orthonormal basis for W . Then, following (112.901) , the family {tpt,k(i)}e,nex is an 
orthonormal basis for £ 2 (R). 

Thus, in a fashion complementary to Section [12.11 we obtain a split of £ 2 (R) into 
a collection {W' '}i£& as a consequence of the axioms of multiresolution analysis 
( |12.86a| )-( jl2.86f] ) (see Figure [12.101 for a graphical representation of the spaces V^' 
and W^ '). We illustrate our discussion with examples. 

Examples 



3 In the infinite sum, we imply closure. 
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.1 


| 


0.5 










Figure 12.30: Sine scaling function and wavelet, (a) Scaling function (p(t). (b) Magni- 
tude Fourier transform |3>(w)|. (c) Wavelet ij)(t). (d) Magnitude Fourier transform |\P (aj) | . 



Example 12.5 (Sinc multiresolution analysis) Let V^ be the space of 
C 2 functions bandlimited to I— 71", it), for which we know that 



¥>(*) 



sin(7rf) 



ITt 



(12.91) 



and its integer shifts form an orthonormal basis. Define V' ' to be the space of C 2 
functions bandlimited to [— 2~ f 7r, 2~^7r). These are nested spaces of bandlimited 
functions, which obviously satisfy ( 112. 86a) , as they do the axioms of multireso- 
lution analysis ( ]12.86b) -( fT2.86f) , that is, the union of the V w s is £ 2 (R), their 



intersection is empty, the spaces are scaled versions of each other and are shift 
invariant with respect to shifts by integer multiples of 2 e . The existence of the 
basis we stated in (12.91) . The details are left as Exercise 12.81 including the 
derivation of the wavelet and the detail spaces W^\ the spaces of C 2 bandpass 
functions, 



W^ 



L 7r,-2-^)U[2- £ 7r,2- £+1 7r). 



(12.92) 



Figure 12.301 shows the sinc scaling function and wavelet both in time as well as 
Fourier domains. 

While the perfect bandpass spaces lead to a bona fide multiresolution anal- 
ysis of £ 2 (R), the basis functions have slow decay in time. Since the Fourier 
transform is discontinuous, the tails of the scaling function and the wavelet de- 
cay only as 0(l/t) (as can be seen in the sinc function ( 112.91) ). We will see in 
latter examples possible remedies to this problem. 

Example 12.6 (Piecewise-linear multiresolution analysis) Let V^ be 
the space of continuous L 2 functions piecewise linear over intervals [k, k + 1), or 
x{t) € V' ' if ||x|| < oo and x'{t) is piecewise constant over intervals [k, k + 1). 
For simplicity, consider functions x(t) such that x(t) = for t < 0. Then x'{t) 
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is specified by the sequence {a&}, the slopes of x(t) over intervals [A;, k + 1), for 
k £ N. The nodes of x(t), that is, the values at the integers, are given by 

and the piecewise-linear function is 

fc-i 
x{t) = [x(k + 1) - x{k)] (t - k) + x(k) = a k (t-k) + ^cn (12.93) 

i=0 

for te [k,k + l) (see Figure HOI) . 

The spaces V^ are simply scaled versions of V^ '; they contain functions 
that are continuous and piecewise linear over intervals [2 k, 2 e (k + 1)). Let us 
verify the axioms of multiresolution. 

(i) Embedding: Embedding as in ( 112.86a) is clear. 

(ii) Upward Completeness: Similarly to the piecewise-constant case, piecewise- 
linear functions are dense in C 2 (R) (see Footnote 157) , and thus upward 
completeness (12.86b) holds. 

(iii) Downward Completeness: Conversely, as £ — > 00, the approximation gets 
coarser and coarser, ultimately verifying downward completeness (12.86c) . 

(iv) Scale Invariance: Scaling (12.86d) is clear from the definition of the piecewise- 
linear functions over intervals scaled by powers of 2. 

(v) Shift Invariance: Similarly, shift invariance (12.86e) is clear from the def- 
inition of the piecewise linear functions over intervals scaled by powers of 
2. 

(vi) Existence of a Basis: What remains is to find a basis for V^ ' . As an 
educated guess, take the hat function from (3.49a) shifted by 1 to the right 
and call it 6{t) (see Figure ELUKa)). Then x{t) in (12.93) can be written 
as 



(t) = J2he(t-k), 



x 

k=0 

with 60 = ao and b k = a k + &fe-i- We prove this as follows: First, 

e'(t) = <p h (t) - tp h (t - 1), 

where <Ph(t) is the Haar scaling function, the indicator function of the unit 
interval. Thus, x'(t) is piecewise constant. Then, the value of the constant 
between k and k + 1 is (bk — &fc-i) and thus equals a k as desired. The only 
detail is that 9(t) is clearly not orthogonal to its integer translates, since 

fc = 0; 

unn.mi-k)), = { i/o. fc = -i,i; 

otherwise. 
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We can apply the orthogonalization procedure in ( 1E12.3-1J ). The ^-transform 
of the sequence [l/6 2/3 1/6] is 

h z + 4 + z -i) ® ^! [ i + (2 _V3),][l + (2-y3)z- 1 ], 
6 6 v „ '- „ ' 



right sided left sided 

where (a) follows from it being a deterministic autocorrelation, positive on 
the unit circle, and could thus be factored into its spectral roots. Choosing 
just the right-sided part, with the impulse response 



"'.= \/^("l) fe (2-V3) fc , 



<p D (t) = 2^a k 9(t-k), 



leads to 

£■ 

k=0 

a function such that (p c (t) = for t < and orthonormal to its integer 
translates. It is piecewise linear over integer pieces, but of infinite extent 
(see Figure EE32). 

Instead of the spectral factorization, we can just take the square root 
as in ( 1E12.3-1] ). In Fourier domain, 

l + l e3U3 + l e ~ 3U1 = ^(2 + cosM). 
Then, 

s[UJ) (2 + cos(w)) 1 /2 

is the Fourier transform of a symmetric and orthogonal scaling function 
<p s (t) (see Figure QX32(c)). 

Because of the embedding of the spaces V^\ the scaling functions all satisfy 
two-scale equations (Exercise 112.6] ) . Once the two-scale equation coefficients are 
derived, the wavelet can be calculated in the standard manner. Naturally, since 
the wavelet is a basis for the orthogonal complement of V^°' in V^ 1 ' , it will be 
piecewise linear over half-integer intervals (Exercise 12.71 ) . 

Example 12.7 (Meyer multiresolution analysis) The idea behind Meyer's 
wavelet construction is to smooth the sine solution in Fourier domain, so as to 
obtain faster decay of the basis functions in the time domain. The simplest way 
to do this is to allow the Fourier transform magnitude of the scaling function, 
|<I>(ti;)| 2 , to linearly decay to zero, that is, 

l, lwl<¥; 



We start by defining a function orthonormal to its integer translates and the 
space V^ ' spanned by those, axiom (vi) 
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x(t) 



x'(t) 



(a) 



(b) 



Figure 12.31: (a) A continuous and piecewise-linear function x(t) and (b) its derivative 

x'(t). 




Figure 12.32: Basis function for piecewise-linear spaces, (a) The nonorthogonal basis 
function 9(t). (b) An orthogonalized basis function tp c {t) such that <p c (t) — for t < 0. 
(c) An orthogonalized symmetric basis function <p s {t). 



(i) Existence of a Basis: The basis function is shown in Figure 112.331 where 
we also show graphically that ( ]3.73d[ ) holds, proving that {ip(t — fc)}fcez is 
an orthonormal set. We now define V^ ' to be 

V<® = span({^(i-fc)} feeZ ). 

(ii) Upward Completeness: Define V^' as the scaled version of V^ ' . Then 

Ql2.86bj ) holds, similarly to the sine case, 
(iii) Downward Completeness: Again, ( 112.86c[ ) holds. 



(iv) Scale Invariance: Holds by construction. 
(v) Shift Invariance: Holds by construction. 

(vi) Embedding: To check V^ 0) C V^~^ we use Figure 12.341 to see that V^ 
is perfectly represented in V^ 1 '. This means we can find a 27r-periodic 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing 



Copyright 2011 M. Vetterli, J. Kovaccvic, and V. K. Goyal 



836 



Chapter 12. Wavelet Bases, Frames and Transforms on Functions 




Y\Q>(m + 2nk)\ 

k 



I I I- I 

-271 -71 



71 2 71 



Figure 12.33: Meyer scaling function, with a piecewise linear squared Fourier transform 
magnitude, (a) The function |$(lj)| 2 . (b) Proof of orthogonality by verifying ( ]3.73d|) . 



function G(e JW ) to satisfy the two-scale equation in Fourier domain ( 1 12.47J ) . 
illustrated in Figure 12.351 

Now that we have verified the axioms of multiresolution analysis, we can 
construct the wavelet. From ( 112.94) , ( ]12.47j ) and the figure, the DTFT of the 
discrete-time filter g n is 

f A M<f; 

0, ^<|w|<7T. 



|G(e 



JU\ 



As the phase is not specified, we chose it to be zero making G(e JUJ ) real and 
symmetric. Such a filter has an infinite impulse response, and its z-transform is 
not rational (since it is exactly zero over an interval of nonzero measure) . It does 
satisfy, however, the quadrature formula for an orthogonal lowpass filter from 
( ]7.13j ). Choosing the highpass filter in the standard way, ( J7.24D , 

with G(e luJ ) real, and using the two-scale equation for the wavelet in Fourier 
domain, ( 112.58) , we get 



*(w) 



0. 



0, 







M 

2tt 


< 
< 


2tt 
3 


<r 




' 3w 


- 1 


in 


2tt 




3 
3 


< 


kl 


< 


A 


'2- 


3u 

47T' 


Hit 
3 






kl 


> 


871 

3 







(12.96) 



The construction and resulting wavelet (a bandpass function) are shown in Fig- 
ure [[27361 Finally, the scaling function ip(t) and wavelet tp(t) are shown, together 
with their Fourier transforms, in Figure 12.371 
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3 J 3 5 5 3 

Figure 12.34: Embedding V (0) C V ( ~ 1} for the Meyer wavelet. 




<l>«ofl>lg J 2 G ^^ 




-4 71 



4ti 8k 



471 



Figure 12.35: The two-scale equation for the Meyer wavelet in frequency domain. Note 
how the 47r-periodic function G(e^ u} ' 2+ ^) carves out $(o>) from $(w/2). 



The example above showed all the ingredients of the general construction 
of Meyer wavelets. The key was the orthogonality relation for $(o>), the fact 
that $(w) is continuous, and that the spaces V™' are embedded. Since $(w) is 
continuous, ip(t) decays as 0{\/t 2 ). Smoother $(w)'s can be constructed, leading 
to faster decay of tp(t) (Exercise 1 1 2 . 9 [ ) . 



12.3.4 Biorthogonal Wavelet Series 

Instead of one scaling function and one wavelet, we now seek two scaling func- 
tions, ip(t) and ip(t), as well as two corresponding wavelets, ip(t) and ip(t) as in 
Section 12.2.41 such that the families 



V^,fc(*) 



1 /t-2 e k 



~ ,, 1 ~(t-2 l k 



(12.97a) 
(12.97b) 



for £, k, £ Z, form a biorthogonal set 

and are complete in £ 2 (K). That is, any x(t) € £ 2 (R) can be written as either 



*(*) = EEft^^w 



tf> 



(x, ip. 



i.k 



lei kei 
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(a) 



H h 




<■>'! 



i j i 



H !-»• co 



(b) 



I I I 



4' 

-K -K 



I I I 



Ji 



G(e/» /2 ) 



H— * co 



(c) 



10 ' 2 2 1 ' ' ' 10 



\|/(co) 



H h 





f— \-$ 1 \-> co 



2 2 1 
1" "3* 3 71 3 71 



Figure 12.36: Construction of the wavelet from the two-scale equation, (a) The stretched 
scaling function <&(w/2). (b) The stretched and shifted lowpass filter G(e j( - u ' /2+ ^). (c) 
The resulting bandpass wavelet \P(w). 





-10 -5 5 




-2 2 



-10 -5 5 10 



Figure 12.37: Meyer scaling function and wavelet, (a) tp(t). (b) $(w). (c) tj){t). (d) 



or 



y^ v^ m: 



*(*) = l^l^Pk'Mt) 



eez feez 



#> 



(%, i>£,k) 
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These scaling functions and wavelets will satisfy two-scale equations as before 
tp(t) = v / 2^g„^(2t-n), !p(t) = V2 ^ g n £(2t - n) , 

nGZ n£Z 

tjj(t) = v^2^2h n ip(2t-n), tjj(t) = y/2^2h n tp(2t-n). 

nEZ n£Z 

We can then define a biorthogonal multiresolution analysis by 

V<® = span(Mi-fc)}fcez), V i0) = span({£(t-fc)}fcez), 

and the appropriate scaled spaces 

V (fl = span({^ lfc } fcG z), V {C) = epm({ipt,khez) , (12.98) 

for IgZ. For a given ip(t) — for example, the hat function — we can verify that the 
axioms of multiresolution analysis (Exercise |12.14j ). From there, define the wavelet 
families as in ( |12.97aj )-( jl2.97bl ), which then lead to the wavelet spaces W^' and 
W^'. While this seems very natural, the geometry is more complicated than in the 
orthogonal case. On the one hand, we have the decompositions 

yd) = v (t+i) 9W (l+i) t (12.99) 

V {1) = vV + V ®W<- t+1 \ (12.100) 

as can be verified by using the two-scale equations for the scaling functions and 
wavelets involved. On the other hand, unlike the orthonormal case, V^ ' is not 
orthogonal to W' '. Instead, 

W W J_ V (l \ W W J_ V (l \ 

similarly to a biorthogonal basis (see Figure P7.ll) . We explore these relationships 
in Exercise 12.151 to show that 



The embedding ( 112. 86a) has then two forms: 

. . . c y (2) c v {1) c ^ (0) c v ( ~ l) c v { - 2) c . . . , 

with detail spaces {W^'}e^z, or, 

. . . c v {2) c v w c y (0) c v { ~ l) c v { ~ 2) c . . . , 

with detail spaces {W^ e '}g^z- The detail spaces allow us to write 
£ 2 (K) = 0lfW = 0fW. 

The diagram in Figure 12.381 illustrates these two splits and the biorthogonality 
between them. 
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(a) 





Figure 12.38: The space £ 2 (R) is split according to two different embeddings. (a) 
Embedding V "'" based on the scaling function ip(t). (b) Embedding V l ' based on the 
dual scaling function ip(t). Note that orthogonality is "across" the spaces and their duals, 
for example, W w ± V {t) . 



VIA Wavelet Frame Series 

12.4.1 Definition of the Wavelet Frame Series 

12.4.2 Frames from Sampled Wavelet Series 
12.5 Continuous Wavelet Transform 

12.5.1 Definition of the Continuous Wavelet Transform 

The continuous wavelet transform uses a function ip(t) and all its shifted and scaled 
versions to analyze functions. Here we consider only real wavelets; this can be 
extended to complex wavelets without too much difficulty. 

Consider a real wavelet ip(t) £ £ 2 (R) centered around t = and having at 
least one zero moment (i.e., fip(t)dt = 0). Now, consider all its shifts and scales, 
denoted by 



lpa,b(t) 



T*(- 



a £ 



be 



(12.101) 



which means that ip a ,b{t) is centered around b and scaled by a factor a. The scale 
;hat the C 2 : 
1 and thus 



factor -j= insures that the C 2 norm is preserved, and without loss of generality, we 



can assume 



lliMI 



i. 



There is one more condition on the wavelet, namely the admissibility condition 
stating that the Fourier transform *&(w) must satisfy 



C,i 



l*MI 2 



dui < co. 



u£E+ 



UJ 



(12.102) 



Since 1^(0)1 = because of the zero moment property, this means that ^(w)! has 
to decay for large u, which it will if ip has any smoothness. In short, ( 112.102) is a 
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Figure 12.39: The wavelet transform, (a) An example function, (b) The magnitude of 
wavelet transform |X(o, 6)|. 



very mild requirement that is satisfied by all wavelets of interest (see, for example, 
Exercise 112.101 ). Now, given a function x(t) in £ 2 (R) 
wavelet transform as 



can define its continuous 



X{a,b) = —= 



1> 



x(t) dt 



1pa,b(t)x(t) dt 



= (/,V>«,6>- (12.103) 

In words, we take the inner product of the function x(t) with a wavelet centered at 
location &, and rescaled by a factor a, shown in Figure [12. 161 A numerical example 
is given in Figure [12.391 which displays the magnitude |AT(a, 6)| as an image. It is 
already clear that the continuous wavelet transform acts as a singularity detector 
or derivative operator, and that smooth regions are suppressed, which follows from 
the zero moment property. 

Let us rewrite the continuous wavelet transform at scale a as a convolution. 
For this, it will be convenient to introduce the scaled and normalized version of the 
wavelet, 



s/a \a 



FT 



* n (w) = Val'(aw), 



(12.104) 



as well as the notation ip(t) = tp(—t). Then 

b 



X(a,b) 



\/a \ a 



fit) dt 



i>a(t-b)f(t)dt 

) 

~- (f*ipa)(b). (12.105) 

Now the Fourier transform of X(a, b) over the "time" variable b is 

X(a,u) = X(w)**(w) = X(w)VaV*(au>), (12.106) 

PT 

where we used ip(—t) < — > ^*(uj) since ip(i) is real. 

12.5.2 Existence and Convergence of the Continuous Wavelet 
Transform 

The invertibility of the continuous wavelet transform is of course a key result: not 
only can we compute the continuous wavelet transform, but we are actually able to 
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come back! This inversion formula was first proposed by J. Morlet J 160 ! 

Proposition 12.13 (Inversion of the continuous wavelet transform) 
Consider a real wavelet ip satisfying the admissibility condition ( 112.102) . A 
function / G £ 2 (R) can be recovered from its continuous wavelet transform 
X(a, b) by the inversion formula 

1 fOO />CO 77 7 

x(t) = — / / x(o,6)Va,6(t)^P, (12.107) 

C-i/i JO J -cxi a 

where equality is in the C 2 sense. 

Proof. Denote the right hand side of (|12.107|) by x(t). In that expression, we replace 
X(a, b) by ( ] 12.105 j) and if) a ,b(t) by i/> a (t - b) to obtain 

x fee [<*> dbda 

W J J_ 00 a z 



W in a 1 



■'ip Jo 

where the integral over b was recognized as a convolution. We will show the C? equality 
of x(t) and x(t) through the equality of their Fourier transforms. The Fourier transform 
of x(i) is 

1 f°° f x / j, 7 , \/j.\ -ju>t dadt 



U * J -00 Jo 

(a) 1 f°° ,_. , . T , , , T , „ da 

= 7^ / x( W *:(w)*.(w) — 

W J a 2 - 

0) 1 v / S [°° It/ \|2 ^ H 



X(«) / a|*(au)|'-j, (12.108) 

W Jo a 

where (a) we integrated first over i, and transformed the two convolutions into products; 
and (b) we used ( 112.104) . In the remaining integral above, apply a change of variable 
Q, — auo to compute: 

|*(aa,)| 2 - = fWrfn = cr ( 12 . 10 9) 

a ,/ n 12 



which together with (12.108) , shows that X(u>) — X(uj). By Fourier inversion, we have 
proven that x(t) — x(t) in the C 2 sense. 

The formula ( 112.107) is also sometimes called the resolution of the identity and goes 
back to Calderon in the 1960's in a context other than wavelets. 

12.5.3 Properties of the Continuous Wavelet Transform 
Linearity 



160 The story goes that Morlet asked a mathematician for a proof, but only got as an answer: 
"This formula, being so simple, would be known if it were correct." 
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Figure 12.40: The shift property of the continuous wavelet transform. 



Shift in Time The continuous wavelet transform has a number of properties, 
several of these being extensions or generalizations of properties seen already for 
wavelet series. Let us start with shift and scale invariance. Consider g(t) = x(t — r), 
or a delayed version of x(t). Then 



X g (a,l 



1 



i • [ ) x(t - t) dt 



1 



Xf(a,b-r) 



< V + T h - \x{t')dt' 
(12.110) 



by using the change of variables t' = t — r. That is, the continuous wavelet trans- 
form of g is simply a delayed version of the wavelet transform of x(t), as shown in 
Figure 112.401 

Scaling in Time For the scaling property, consider a scaled and normalized version 
of x(t), 

where the renormalization ensures that \\g\\ = ||/||. Computing the continuous 
wavelet transform of g, using the change of variables t' = t/s, gives 



X g (a 



1 



gv*i ' 



'■'^/(fl* 



1 



\( S J--t)f(t')dt' 



i/> 



a b 



f --¥l) x (t')dt> = XI-,- 

a/s J \s s 



(12.111) 



In words: if g(t) is a version of x(t) scaled by a factor s and normalized to maintain 
its energy, then its continuous wavelet transform is a scaled by s both in a and b. 
A graphical representation of the scaling property is shown in Figure 12.411 
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Figure 12.41: The scaling property of the continuous wavelet transform. 



Consider now a function x(t) with unit energy and having its wavelet transform 
concentrated mostly in a unit square, say [oo, ao + 1] x [bo, bo + 1]. The continuous 
wavelet transform of g(t) is then mostly concentrated in a square [sao, s(ao + 1)] x 
[s&o, s(bo + 1)], a cell of area s 2 . But remember that g(t) has still unit energy, while 
its continuous wavelet transform now covers a surface increased by s 2 . Therefore, 
when evaluating an energy measure in the continuous wavelet transform domain, 
we need to renormalize by a factor a 2 , as was seen in both the inversion formula 
( 1 12. 10T[ ) and the energy conservation formula ( [12.1 12[ > . 

When comparing the above properties with the equivalent ones from wavelet 
series, the major difference is that shift and scale are arbitrary real variables, rather 
than constrained, dyadic rationals (powers of 2 for the scale, multiples of the scale 
for shifts). Therefore, we obtain true time scale and shift properties. 



Parseval's Equality Closely related to the resolution of the identity is an energy 
conservation formula, an analogue to Parseval's equality. 



Proposition 12.14 (Energy conservation of the continuous wavelet transform) 
Consider a function / G £ 2 (M.) and its continuous wavelet transform X(a, b) with 
respect to a real wavelet tp satisfying the admissibility condition ( 112.102J ). Then, 
the following energy conservation holds: 



W)\ 2 dt = ±r f 



\X(a,b)\' 



dbda 



(12.112) 
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Proof. Expand the right hand side (without the leading constant) as 
\X(a, b )\ 2d -^ « f f \(f*i> a )(b)\ 2 db^ 






J-/ \X(u)y/i*'(au)\ a du% 

^7T Jb€R a 

±[ |XH| 2 |*(aa,)| 2 daA 



where (a) uses ( ]12.105j) ; and (b) uses Parseval's equality for the Fourier transform with 
respect to b, also transforming the convolution into a product. Changing the order of 
integration and in (c) using the change of variables Q — aui allows us to write the above 
as 

\X{a,b)\ Z: =- - i tHXHI 2 / \9(aw)\^du 



(c) 



_ 2tt 
f ^|FH| 2 / |*(fi)| 2 ^du,. 



Therefore 



5"/ I 

°</> JaER+ Jb€ 






and applying Parseval's equality to the right side proves (112.112] ). 

Both the inversion formula and the energy conservation formula use da db/a 2 
as an integration measure. This is related to the scaling property of the continuous 
wavelet transform as will be shown below. Note that the extension to a complex 
wavelet is not hard; the integral over da has to go from — 00 to 00, and C^, has to 
be defined accordingly. 

Redundancy The continuous wavelet transform maps a one-dimensional function 
into a two-dimensional one: this is clearly very redundant. In other words, only 
a small subset of two-dimensional functions correspond to wavelet transforms. We 
are thus interested in characterizing the image of one-dimensional functions in the 
continuous wavelet transform domain. 

A simple analogue is in order. Consider an M by N matrix T having or- 
thonormal columns (i.e., T T T = I) with M > N. Suppose y is the image of an 
arbitrary vector x G R through the operator T, or y = Tx. Clearly y belongs to 
a subspace S of R M , namely the span of the columns of T. 

There is a simple test to check if an arbitrary vector z G R. M belongs to S. 
Introduce the kernel matrix K , 

K = TT T , (12.113) 

which is the M by M matrix of outer products of the columns of T . Then, a vector 
z belong to S if and only if it satisfies 

Kz = z. (12.114) 
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Indeed, if z is in S, then it can be written as z = Tx for some x. Substituting this 
into the left side of ( 112.114) leads to 

Kz = TT T Tx = Tx = z. 

Conversely, if (12.114) holds then z = Kz = TT T z = Tx, showing that z belongs 
to S. 

If z is not in S, then Kz = z is the orthogonal projection of z onto S as 
can be verified. See Exercise 12.11 for a discussion of this, as well as the case of 
non-orthonormal columns in T . 

We now extend the test given in ( 112.114} ) to the case of the continuous wavelet 
transform. For this, let us introduce the reproducing kernel of the wavelet tp(t), 
defined as 

K(a 0l b 0l a,b) = (ip aa ,b , ip a ,b) ■ (12.115) 

This is the deterministic crosscorrelation of two wavelets at scale and shifts (arj, bo) 
and (a, b), respectively, and is the equivalent of the matrix K in ( J12. 1 13f ) . 

Call V the space of functions X(a, b) that are square integrable with respect 
to the measure (dbda)/a 2 (see also Proposition 112. 141 . In this space, there exists a 
subspace S that corresponds to bona fide continuous wavelet transforms. Similarly 
to what we just did in finite dimensions, we give a test to check whether a function 
X{a, b) in V actually belongs to S, that is, if it is the continuous wavelet transform 
of some one-dimensional function x(t). 

Proposition 12.15 (Reproducing kernel property of the continuous wavelet transform) 
A function X(a,b) is the continuous wavelet transform of a function x(t) if and 
only if it satisfies 

-1 /»oo /»oo It 7 

X(a ,b ) = — / K(a ,b ,a 7 b)X(a,b)^. (12.116) 

G"0 Jo J~oo a 

Proof. We show that if X(a, b) is a continuous wavelet transform of some function x(t), 
then ( 112.1161) holds. Completing the proof by showing that the converse is also true is 
left as Exercise 12.121 
By assumption, 



X(a Q ,b ) — / ip a0ibo (t)x(t)dt. 

J — OO 

Replace x(t) by its inversion formula ( ]12.107jh or 

/OO -1 /"OO /"OO JI J 

V»« .»o(*)7r / / V«,k(t)*M) — 5- # 
-00 W Jo J-00 a 

(a) 1 r°° r°° r°° , , , , , ,„. ,. , dbda 

CV Jo J-ooJ-co a 

- — / / K{ao,bo,a,b)X(a,b) — — , 
G^, 7 J-00 a 

where (a) we interchanged the order of integration; and (b) we integrated over t to get 
the reproducing kernel ( 112.1151) . 
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Characterization of Singularities The continuous wavelet transform has an inter- 
esting localization property which is related to the fact that as a — > 0, the wavelet 
ipa,b(t) becomes arbitrarily narrow, performing a zoom in the vicinity of b. This is 
easiest to see for x(t) = 6(t — r). Then 

X(a,b) = -L / 1>(— )s(t-r)dt = -U(— V ( 12 - 117 ) 
VaJtm V a J y/a \ a J 

This is the wavelet scaled by a and centered at b. As a — > 0, the continuous wavelet 
transform narrows exactly on the singularity and grows as a^ 1 ' 2 . 

A similar behavior can be shown for other singularities as well, which we 
do now. For simplicity, we consider a compactly supported wavelet with N zero 
moments. We have seen the most elementary case, namely the Haar wavelet (with a 
single zero moment) in Section [12. 11 Another example is the ramp function starting 
at t: 

*(*) = ( ?' ! - r; 

W \ t-T, t> T. 

This function is continuous, but its derivative is not. Actually, its second derivative 
is a Dirac delta function at location r. 

To analyze this function and its singularity, we need a wavelet with at least 
2 zero moments. Given a compactly supported wavelet, its second order primitive 
will be compactly supported as well. To compute the continuous wavelet transform 
X(a, b), we can apply integration by parts just like in (12.34) to obtain 

X(a,b) = - [ ^ei*—-)^^)^, 
Jtm \ a, J 

where x' (t) is now a step function. We apply integration by parts one more time to 
get 



X(a,b) 



o 3/20(i) (til) x u t) \ + a 3/2 f 9 {i) (t^±\ x „ {t) dt 

V a / item J tern \ a J 



= a 3/2 / {D (t_^b\ 5{t _ T)dt = a 3/2 (l) (ZZ±) (12.H8) 

Jtes. \ a J \ a J 

where O^ 1 ' (t) is the primitive of 6(t), and the factor a 3 ' 2 comes from an additional 
factor a due to integration of 9(t/a). The key, of course, is that as a — ► 0, the 
continuous wavelet transform zooms towards the singularity and has a behavior of 
the order a 3 ' 2 . These are examples of the following general result. 

Proposition 12.16 (Localization property of the continuous wavelet transform) 
Consider a wavelet tp of compact support having TV zero moments and a function 
x(t) with a singularity of order n < N (meaning the nth derivative is a Dirac 
delta function; for example, Dirac delta function = 0, step = 1, ramp = 2, etc.). 
Then, the wavelet transform in the vicinity of the singularity at r is of the form 

X{a,b) = (_l)»o B-1/ V (n) ( ^^j . (12.119) 
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Figure 12.42: A function with singularities of order 0, 1 and 2, and its wavelet transform. 



where ip( n ' is the nth primitive of t/j. 



Proof. (Sketch) The proof follows the arguments developed above for n = 0, 1, and 2. 
Because ip(t) has N zero moments, its primitives of order n < N are also compactly 
supported. For a singularity of order n, we apply integration by parts n times. Each 
primitive adds a scaling factor a; this explains the factor a n ~ ' (the —1/2 comes from 
the initial 1/y/a factor in the wavelet). After n integrations by parts, x(t) has been 
differentiated n times, is thus a Dirac delta function, and reproduces if>^ n ' at location 



The key is that the singularities are not only precisely located at small scales, 
but the behavior of the continuous wavelet transform also indicates the singularity 
type. Figure 12.42 sketches the continuous wavelet transform of a function with a 
few singularities. 

We considered the behavior around points of singularity, but what about 
"smooth" regions? Again, assume a wavelet of compact support and having N 
zero moments. Clearly, if the function x(t) is polynomial of order N — 1 or less, 
all inner products with the wavelet will be exactly zero due to the zero moment 
property. If the function x(t) is piecewise polynomial ] 161 | then the inner product 
will be zero once the wavelet is inside an interval, while boundaries will be detected 
according to the types of singularities that appear. We have calculates an example 
in Section 12.1 for Haar, which makes the above explicit, while also pointing out 
what happens when the wavelet does not have enough zero moments. 

Decay and Smoothness Beyond polynomial and piecewise-polynomial functions, 
let us consider more general smooth functions. Among the many possible classes 



161 That is, the function is a polynomial over intervals (ij,ij_|_i), with singularities at the interval 
boundaries. 
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of smooth functions, we consider functions having m continuous derivatives, or the 
space C m . 

For the wavelet, we take a compactly supported wavelet ip having N zero 
moments. Then, the ATth primitive, denoted ip ^ ', is compactly supported and 

^ N \t)dt = C / 0. 

This follows since the Fourier transform of ip has N zeros at the origin, and each 
integration removes one, leaving the Fourier transform ip( N > nonzero at the origin. 
For example, the primitive of the Haar wavelet is the hat function in (12.33) , with 
integral equal to 1/2. 

Consider the following scaled version of ip( N \ namely a~ l ip^ N '{t/a). This 
function has an integral equal to C, and it acts like a Dirac delta function as a — > 
in that, for a continuous function x{t), 

lim / -^ Nq n i —)x(t)dt = Cx{b). (12.120) 

a ^°Jtm a \ a ) 

Again, the Haar wavelet with its primitive is a typical example, since a limit of 
scaled hat functions is a classic way to obtain the Dirac delta function. We are now 
ready to prove the decay behavior of the continuous wavelet transform as a — > 0. 

Proposition 12.17 (Decay of continuous wavelet transform for x e C N ) 
Consider a compactly supported wavelet ip with N zero moments, N > 1, and 
primitives ip^ l \ . . . , V* ! where J ip( N >(t) dt = C. Given a function x{t) having 
N continuous and bounded derivatives Z*- 1 ', ..., /( ' , or / £ C , then the 
continuous wavelet transform of x(t) with respect to ip behaves as 

|JC (o,6)| < C'a N+1/2 (12.121) 

for a — > 0. 



Proof, (sketch) The proof closely follows the method of integration by parts as used 
in Proposition 12.161 That is, we take the ./Vth derivative of x(t), f (t), which is 
continuous and bounded by assumption. We also have the Nth primitive of the wavelet, 
ip (t), which is of compact support and has a finite integral. After N integrations by 
parts, we have 

X(a,b) = f -^-=ip( i —^\x(t)dt 
J-oo va V a J 

® (-l) N a N -L f X ^) (tz±\ f N \t)dt 

(b) { _ 1)Na N + i/2 r i^n) ft_-_b\ f ( N){t)dtj 

where (a) N steps of integration by parts contribute a factor a N ; and (b) we normalize 
the A^th primitive by l/o so that it has a constant integral and acts as a Dirac delta 
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Figure 12.43: A function and its scalogram. (a) Function with various modes, (b) 
Scalogram with a Daubechies wavelet, (c) Scalogram with a symmetric wavelet, (d) 
Scalogram with a Morlet wavelet. 



function as a — > 0. Therefore, for small a, the integral above tends towards Cf^ N '(b), 
which is finite, and the decay of the continuous wavelet transform is thus of order 

a N+1 > 2 . 

While we used a global smoothness, it is clear that it is sufficient for x(t) to be C N 
in the vicinity of 6 for the decay to hold. The converse result, namely the necessary 
decay of the wavelet transform for x(t) to be in C N , is a technical result which 
is more difficult to prove; it requires non-integer, Lipschitz, regularity. Note that 
if x(t) is smoother, that is, it has more than N continuous derivatives, the decay 
will still be of order a N+l ' 2 since we cannot apply more integration by parts steps. 
Also, the above result is valid for N > 1 and thus cannot be applied to functions in 
C°, but it can still be shown that the behavior is of order a 1 ' 2 as is to be expected. 

Scalograms So far, we have only sketched continuous wavelet transforms, to point 
out general behavior like localization and other relevant properties. For "real" func- 
tions, a usual way of displaying the continuous wavelet transform is the density 
plot of the continuous wavelet transform magnitude |X(o, 6)|. This is done in Fig- 
ure [1X43] for a particular function and for 3 different wavelets, namely an orthogonal 
Daubechies wavelet, a symmetric biorthogonal wavelet, and the Morlet wavelet. 

As can be seen, the scalograms with respect to symmetric wavelets (Fig- 
ure [[233] (c) and (d)) have no drift across scales, which helps identify singularities. 
The zooming property at small scales is quite evident from the scalogram. 

Remarks The continuous-time continuous wavelet transform can be seen as a 
mathematical microscope. Indeed, it can zoom in, and describe the local behav- 
ior of a function very precisely. This pointwise characterization is a distinguishing 
feature of the continuous wavelet transform. The characterization itself is related 
to the wavelet being a local derivative operator. Indeed, a wavelet with N zero 
moments acts like an ATth order derivative on the function analyzed by the wavelet 
transform, as was seen in the proofs of Propositions 112.161 and 12.171 Together 
with the fact that all scales are considered, this shows that the continuous wavelet 
transform is a multiscale differential operator. 
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(b) 



Figure 12.44: Morlet wavelet, (a) Time domain function, with real and imaginary parts 
in solid and dotted lines, respectively, (b) Magnitude spectrum of the Fourier transform. 



Compactly Supported Wavelets: Throughout the discussion so far, we have 
often used the Haar wavelet (actually, its centered version) as the exemplary wavelet 
used in a continuous wavelet transform. The good news is that it is simple, short, 
and antisymmetric around the origin. The limitation is that in the frequency domain 
it has only a single zero at the origin; thus it can only characterize singularities up 
to order 1, and the decay of the continuous wavelet transform for smooth functions 
is limited. 

Therefore, one can use higher order wavelets, like any of the Daubechies 
wavelets, or any biorthogonal wavelet. The key is the number of zeros at the 
origin. The attraction of biorthogonal wavelets is that there are symmetric or an- 
tisymmetric solutions. Thus, singularities are well localized along vertical lines, 
which is not the case for non-symmetric wavelets like the Daubechies wavelets. At 
the same time, there is no reason to use orthogonal or biorthogonal wavelets, since 
any functions satisfying the admissibility conditions (12.102) and having a sufficient 
number of zero moments will do. In the next subsection, scalograms will highlight 
differences between continuous wavelet transforms using different wavelets. 

Morlet Wavelet: The classic, and historically first wavelet is a windowed com- 
plex exponential, first proposed by Jean Morlet. As a window, a Gaussian bell 
shape is used, and the complex exponential makes it a bandpass filter. Specifically, 
the wavelet is given by 

ip(t) = ^L e -^°*e- t2/2 , (12.122) 



2tt 



with 



Wo 



2 

ln~2' 



where ujq is such that the second maximum of yt(tp(t)) is half of the first one (at 
t = 0), and the scale factor 1/V2tt makes the wavelet of unit norm. It is to be noted 
that \P(0) 7^ 0, and as such the wavelet is not admissible. However, ^(0) is very 
small (of order 10 -7 ) and has numerically no consequence (and can be corrected by 
removing it from the wavelet). Figure [12.441 shows the Morlet wavelet in time and 
frequency domains. 

It is interesting to note that the Morlet wavelet and the Gabor function are 



a3.0 [October 2011] CC by-nc-nd 



Comments to book-errata@FourierAndWavclets.org 



Fourier and Wavelet Signal Processing Copyright 2011 m. Vetterii, j. Kovaccvic, and v. k. Goyai 



852 Chapter 12. Wavelet Bases, Frames and Transforms on Functions 

related. From ( 112.1221 ) the Morlet wavelet at scale a ^ is 

\f2ira 



while, following ( 111.3) and ( 111.6) , the Gabor function at uj is 

g Ut0 (t) = ^ =e ^*/a e -t 2 /2a 2 
\/2na 

which are equal for u> = loq = tt\/2/ In 2 and the same scale factor a. Thus, there 
is a frequency and a scale where the continuous wavelet transform (with a Morlet 
wavelet) and a local Fourier transform (with a Gabor function) coincide. 

12.6 Computational Aspects 

The multiresolution framework derived above is more than just of theoretical inter- 
est. In addition to allow constructing wavelets, like the spline and Meyer wavelets, 
it also has direct algorithmic implications as we show by deriving Mallat's algorithm 
for the computation of wavelet series. 

12.6.1 Wavelet Series: Mallat's Algorithm 

Given a wavelet basis {V ; m,n(i)}m,neZ) any function x(t) can be written as 

7n£|Z n£Z 

where 

$? = (f,1>m, n )- (12.123) 

Assume that only a finite-resolution version of x(t) can be acquired, in particular 
the projection of x(t) onto V^°' , denoted f^°'(t). Because 



m— 1 

we can write 



m— 1 ro£ 

Since f^°\t) G V^°), we can also write 



V (o) = 0VF^, 

m— 1 
00 

f m (t) = EE^^.»w- ( 12 - 124 ) 



f(°\t) = ^afVft-n), (12.125) 



where 

v(0) 



«f = (x(t), <p(t - n)) t = (/, <p , n ). 
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Given these two ways of expressing f^°'(t), how to go from one to the other? The 
answer, as to be expected, lies in the two-scale equation, and leads to a filter bank 
algorithm. Consider f^'(t), the projection of f^°'(t) onto V^ 1 ' . This involves 
computing the inner products 

<#> = </ (0) W, js<p{t/2-n)) t , neZ. (12.126) 

From ( 112.48) , we can write 

-y=ip(t/2 - n) = ^2g k y(t-2n-k). (12.127) 



"'.; = J2^9ka? ) {<p(t-2n-k),<p(t-t)) t 



Replacing this and ( 112.125) into ( 112.126) leads to 

.a) = ^y^„,.,v( >/ 

feez £ei 

( =' ]>>-2„af " (9*<* i0) hn, (12.128) 

eez 

where (a) follows because the inner product is unless £ = 2n + k; and (b) simply 
rewrites the sum as a convolution, with 

g n = g-n- 

The upshot is that the sequence a n is obtained from convolving a n with g (the 
time-reversed impulse response of g) and downsampling by 2. The same develop- 
ment for the wavelet series coefficients 

VP = (/ (0) (*).^(*/2-n)) t 

yields 

/3« = (£*a<°>) a „, (12.129) 

where 

"n — "— n 
is the time-reversed impulse response of the highpass filter /i. The argument just 
developed holds irrespectively of the scale at which we start, thus allowing to split a 
function /^ in V® into its components /("+ 1 ) in V ( - m+r > and d^ 1 ) in T^( m+1 ). 
This split is achieved by filtering and downsampling cr ' with g and ft,, respectively. 
Likewise, this process can be iterated k times, to go from TA > to ]/( m + fc ), while 
splitting off W ( - m+1 \ W (m+2 \ ..., W^ m+k \ or 

yM = PF (m+1) © VF (m+1) © .. .®W ( - m+k ' ) © T/ (m+fc) . 

The key insight is of course that once we have an initial projection, for exam- 
ple, f(°'(t) with expansion coefficients a n , then all the other expansion coefficients 
can be computed using discrete-time filtering. This is shown in Figure [12.461 where 
the sequence a n , corresponding to an initial projection of x{t) onto V^°\ is decom- 
posed into the expansion coefficients in 14 7 ' 1 ', W^ 2 \ W^ and V^- 3 \ This algorithm 
is known as Mallat's algorithm, since it is directly related to the multiresolution 
analysis of Mallat and Meyer. 
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Figure 12.45: Splitting of V (0) into W (1) , W (2) , W (3) and V (3) , shown for a sine mul- 
tiresolution analysis. 



4 0) = <<pm/> ^Xl 
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Figure 12.46: Mallat's algorithm. From the initial sequence a 7l , all of the wavelet 
series coefficients are computed through a discrete filter bank algorithm. 
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Figure 12.47: Initialization of Mallat's algorithm. The function x(t) is convolved with 
£?(£) = f{— t) and sampled at t = n. 



Initialization How do we initialize Mallat's algorithm, that is, compute the initial 
sequence a n ? There is no escape from computing the inner products 

a® = (x(t),v(t-n)) t = (£*/)l t= „, 



where <p(t) = ip(—t). This is shown in Figure [12.471 

The simplification obtained through this algorithm is the following. Comput- 
ing inner products involves continuous-time filtering and sampling, which is difficult. 
Instead of having to compute such inner products at all scales as in ( 112.1231 ) , only a 
single scale has to be computed, namely the one leading to a n ■ All the subsequent 
inner products are obtained from that sequence, using only discrete-time processing. 
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Figure 12.48: Initial approximation in Mallat's algorithm, (a) Function x(t). (b) 
Approximation / (£) with Haar scaling function in V 1 ' ■* and error e. (t) — x(t) — p°'(t). 
(c) Same but inV (_3) , or / (_3) (i) and e (_3) (t) = x(t) - / ( " 3) (i). 



The question is: How well does f^°'{t) approximate the function x(t)l The 
key is that if the error ||/^ — /|| is too large, we can go to finer resolutions f^\ 
m < 0, until ||_p ' — f\\ is small enough. Because of completeness, we know that 
there is an m such that the initial approximation error can be made as small as we 
like. 

In Figure [L2.481 we show two different initial approximations and the resulting 
errors, 



M 



(<) = x{t)-f^{t) 



Clearly, the smoother the function, the faster the decay of 
Exercise 12.13 explores this further. 



M\ 



The Synthesis Problem We have considered the analysis problem, or given a 
function, how to obtain its wavelet coefficients. Conversely, we can also consider 
the synthesis problem. That is, given a wavelet series representation as in ( 112.1241 ) . 
how to synthesize f^°'(t). One way is to effectively add wavelets at different scales 
and shifts, with the appropriate weights ( 112. 1231 ). 

The other way is to synthesize f^°'(t) as in Q12.125J) , which now involves 
only linear combinations of a single function ip(t) and its integer shifts. To make 
matters specific, assume we want to reconstruct f(°> G V^ ' from f^'(t) G V^ 1 ' and 
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S- l '{f) £ T 7 ^ 1 ). There are two ways to write f^°'(t), namely 

/ (0) (t) = X>£M*-") (12.130) 

= TTf E «^V(*/2 - n) + -±= Y, /# } lK*/2 " »)> (12-131) 

where the latter is the sum of f^(t) and (^'(t). Now, 

of = (f i0) (t),v(t-l))f 
Using the two-scale equation ( 112.127) and its equivalent for ip(t/2 — n), 

—^(t/2-n) = J2h k <p{t-2n-k), 

v2 , „ 



we can write 



E E °4 X W(i - 0. ¥>(< - 2n - fc)> t 

+ E E PPhkifit - £), <p(t - 2n - fc)) t 

nez/cez 

E "l 1) ^-2n + E ^^-an (12.132) 



(a) 

E 

neZfcG 
(6) 



iifcl 



where (a) follows from ( 112.131) using the two-scale equation; and (b) is obtained 
from the orthogonality of the tps, unless k = £ — 2n. The obtained expression for 
a) indicates that the two sequences oi and pi are upsampled by 2 before being 
filtered by g and h, respectively. In other words, a two-channel synthesis filter 
bank produces the coefficients for synthesizing f^°'(t) according to ( 112.130) . The 
argument above can be extended to any number of scales and leads to the synthesis 
version of Mallat's algorithm, shown in Figure [12.491 

Again, the simplification arises since instead of having to use continuous- 
time wavelets and scaling functions at many scales, only a single continuous-time 
prototype function is needed. This prototype function is tp(t) and its shifts, or the 
basis for V^°' . Because of the inclusion of all the coarser spaces in V*- ' , the result 
is intuitive, nonetheless it is remarkable that the multiresolution framework leads 
naturally to a discrete-time filter bank algorithm. 

12.6.2 Wavelet Frames 
Chapter at a Glance 
Historical Remarks 

TBD 
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<P(0 



./<°*0 



Figure 12.49: Synthesis of f^°' (£) using Mallat's algorithm. The wavelet and scal- 
ing coefficients are fed through a DWT synthesis, generating the sequence a„ . A final 
continuous-time processing implementing ( 112,1301) leads to f (t). 



Lowpass & scaling function Highpass & wavelet 



Filter G(z) = ( 1± f— ) R{z) 

Function *(u>) = Y[ ^G(e ju} / 2% ) 

i=l 

Two-scale *(u>) = 2- 1 / 2 G(e^/ 2 )*(u;/2) 

equation tp(t) = \/2En 9n<fi(2t — n) 

Orthogonality (ip(t), <p(t — n))t = S n 

{ip(t), ip(t - n)) t = 
Smoothness Can be tested, 

increases with N 

Moments Polynomials of degree N — 1 

are in span({</?(i — n)} ne z) 

Size and support(g) = {0, . . . , L — 1} 

support support(ip) = [0,L — 1] 



H(z) = ,-M-i (!=»)* «(-*-*) 

oo 
i = 2 

tt(u>) = 2- 1 / 2 J/(e^/ 2 )$(u;/2) 
V>(t) = v / 2E„^n¥'(2t-n) 

(V>(i), 2- m / 2 ^(2- m t-n)> t = <5 n 
Same as for (/>(t) 



Wavelets has N zero moments 

support (h) = {0, . . . , L — 1} 
support(i/>) = [0,L — 1] 



Table 12.1: Major properties of scaling function and wavelet based on an iterated filter 
bank with an orthonormal lowpass filter having N zeros at z — — 1 oi cu = n. 



Further Reading 

Books and Textbooks Daubechies [41]. 

Results on Wavelets For the proof of completeness of Theorem 1 12.6 [ see [391 33] • 



Exercises with Solutions 

12.1. Proof of Proposition \TKJ\ 
Prove Proposition 112.1] 
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Solution: The overall strategy of the proof is to show that convergence implies that the 
sum of the even polyphase component of g, y^ 92n, equals the sum of the odd polyphase 
component, J^ n (?2n+i- This implies a zero at 2 = —1. 

To obtain a useful time-domain expression from the z-domain expression describing 
the iterative process, G^ J '{z) = G(z)G^ J ~ \z 2 ), let / be the sequence with 2 transform 
G (J-i)( z 2) (j is g(J-l) upsampled by 2.) Then 

9k = ^2,9kfn-k = 'Yl l 92lfn-2l+'^92l+lfn-(2l+l), (E12.1-1) 

fcez lei lei 

by breaking the convolution sum into k = 21 and k = 21 + 1 terms. Now since /„ = for 
all odd values of n, in the right side of ( |E12.1-1| ), the first sum is zero for odd n and the 
second sum is zero for even n. Thus 

(L/2)-l 
92n = ]>>2£/2n-2£ = ]T 9219^, (E12.1-2a) 

<ez e=o 

(L/2)-l 
9 ( 2n+l = J2^+lf2n-2l = E 921+19^, (E12.1-2b) 

£ez 1=0 

where we have also used the length of g to write finite sums. 

Now suppose that limj^^ <p( J%l (t) exists for all t, and denote the limit by f(t). It 
must also be true that f(r) 7^ for some r; otherwise we contradict that 1 1 y "^ 1 1 = 1 f° r 
every J . With nj chosen as any function of J such that Vaaj^^ 2nj/2 J = r, we must 
have 

ton 2^> = lim 2"*g<£ j+1 

J— *oo ■' J— too J 

because both sides equal <p{r). Multiplying by 2 J / 2 and taking limits in ( [E12.1-2) and 
then subtracting, we find 

£=0 v 

Since the limit above exists equals ip(r) ^ for every £ € {0, 1, {L/2) — 1}, we conclude 
= J2i € z (921 - 92i+i) = G(-l). 
12.2. Proof of Proposition \TK4\ 
Prove Proposition |12.4l 
Solution: We need to prove that 

which amounts to proving that 

oo 

|7(w)| = n 2_1/2i? ( eJ " /2i ) < <^0- + H) (N ~ 1 ~ e) (E12.2-1) 

i=l 

since we have shown that there is a "smoothing term" of order l/|u;| for large uj. Re- 
call that R(e ju ) is 27r-periodic and that R(l) = \/2. Because \R(e ju )\ < 2 N ~ 1 / 2 by 
assumption, we can find a constant a such that 

|-R(e*")|| < s/2{\ + a\u>\). 

Using a Taylor series for the exponential, 

\R{e 3 ")\ < V2e a ^. (E12.2-2) 

Consider now 7(0;) for |oj| < 1, and let us find an upper bound based on the bound 
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on \R{e^)\: 



sup | 7 (o0| = sup Y\2- 1 / 2 \R.{e j 



( < f7e Q ^/ 2t l = e «l"l(l/2+i/4+-) 

k=l 

m 

< e a , (E12.2-3) 



where (a) follows from ( [E12.2-2] ); and (b) comes from |oj| < 1. 
For |oj| > 1, there exists a J > 1 such that 

2 J_1 < | a; | < 2 J . 

We can then split the infinite product into two parts, namely 

oo J oo 

Y\2- 1 ! 2 \R{e^/ 2 ")\ = n2- 1/2 |R(e^ /2fc )|- ri 2 " 1/2 | R ( eJi ^)l- 
fc=i fc=i fc=i 

Because |oj| < 2 J , we can bound the second product by e a according to ( |E12,2-3J ), The 
first product has J terms and can be bounded by 2 - *'/ 2 ■ B J . Using B < 2 JV_1 / 2 , we can 
upper bound the first product by (2 N ~ 1 ~ e ) J . This leads to 

sup |7(w)| < c'^- 7 ^" 1 -^ < c'"(l + \u J \) N - 1 - e , 

2 J - 1 <H<2 J 

where we used the fact that u is between 2 J_1 and 2 . Thus, the growth of |7(tt>)| is 
sufficiently slow and therefore $(w) decays faster than l/|w|, proving continuity of <p(i). 
12.3. Multiresolution Analysis with a Riesz Basis for y(°) 

Consider a multiresolution analysis with a Riesz basis {9(t — n)} n€ % for V^ '. Then there 
exists a scaling function <p(t) that satisfies the two-scale equation (112.481 1 and {ip(t — n)} ne z 
is an orthonormal basis for V^ . 
Solution: Since 8(t) and its integer shifts form a Riesz basis, ©(w) satisfies 

< Yl l©(^ + 27rfc)| 2 < oo. 



We create a new function ip(t) with the Fourier transform 



*(u>) = ° ( " ) (E12.3-1) 



which satisfies 

y"|*(^ + 27r^)| 2 ( = — y"|6(a; + 27r^)| 2 = 1, 

where in (a) we pulled the denominator in front of the sum since it is 27r periodic. According 
to (j3,73d)) , <p(t) is thus orthogonal to its integer shifts. The fact that it satisfies a two-scale 
equation was shown in fll2.8[ ) for the Haar case, but the argument based on inclusion of 
y(0) in y( _1 ) is general. 



Exercises 

12.1. Sine Function as an Infinite Product 
Prove that 



sin t -i— r , t . ,„ 



t , LL y 2 k ' 

k = 
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using the following: 

sin t t t 

lim = 1, sint = 2 sin — cos— . 

t— t 2 2 

The inspiration for this problems comes from [144] . (Hint: Use the trigononretric identity 
recursively.) 

12.2. Fourier Domain Iteration of Haar Filter 

Consider the Fourier-domain iteration of the stretched Haar filter 

G{z) = -L(l + *-3) 



(i) Verify that 

*(u>) 



-j3u>/l sin(3a;/2) 



3^/2 

(ii) Verify that each finite iteration is of norm 1, while the limit is not, showing failure 
of C 2 convergence. 

12.3. Multiscale Equation 

Based on the two scale equation (112.471) , verify the expressions ( |12.71a[ ) and ( |12.72a[ ) 

*(w) = 2- fe / 2 G( fc) (e i " /2fc )*(^/2 fc ), 

<P(w) = 2-^ 2 H ( - k 'l(e^^ k )^(iu/2 k ), 
as well as their time domain equivalents given in (]12.71b)) and ( |12.72b[ ). 

12.4. Scaling Behavior of Wavelet Coefficients around Singularities 

In Proposition 12.81 it is stated that wavelet coefficients close to singularities of order k 
behave as 

p (jn) ^ 2 m(»-l/ a ) 

for m — » — oo. This was shown for k = and 1 in the text. Extend the method used to 
prove the case k = 1 to include larger fc's and thus prove (112.84) in general. 

12.5. Best Least Squares Approximation in the Haar Case 

Consider a function /_ i (£) in V- 1 , the space of functions constant over half integer inter- 
vals. Show that the best least squares approximation fo(t) in Vo, the space of functions 
constant over integer intervals, is given by the average over two successive intervals. 

12.6. Two-Scale Equation for Piecewise Linear Spaces 

In Example 112.61 we saw various bases for piecewise linear spaces. From the fact that tp(t) 
satisfies a two scale equation, derive the two scale equation for the orthonormal scaling 
function tp(t). 

(i) Give the two scale equation for 0(t). 
(ii) From the expression for <E>(oj) in (JE12.3-1) and the two scale equation for 0(oj), 

derive the two scale equation for <E>(l<j). 
(iii) Derive the expression for G(e^), the Fourier transform of the sequence of coefficients 

of the two scale equation. 

Note This can be done for either the causal case, <p c (t), or the symmetric case, ip s (t). 

12.7. Wavelets for Piecewise Linear Spaces Given the coefficients of the two scale equation for 
the orthonormal scaling function ip(t) (see Exercise 12.6) , derive an expression for the 
wavelet, based on the Fourier expression 

tt(u>) = L e -J*"/2 G !*( e i("/2+^)) . $( w / 2 ) 

v2 

where G(e Jt ") is the discrete Fourier transform of the coefficients of the two scale equation 
for <p(t). 
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12.8. Sine Multiresolution Analysis 

Consider the sequence of spaces of bandlimited functions 

V m = BL (1-2- m 7T, 2~ m 7T]) 

with an orthonormal basis for Vb given by 

sin(Trf) 
¥>(*) = — 

and its integer shifts. 



(i) Verify that the axioms of multiresolution analysis in Definition 12.10 are satisfied, 
(ii) Given the embedding Vb C V_ i , derive the two-scale equations coefficients g n in 

ifi(t) = V2^g n ip{2t-n). 

neZ 

(iii) Derive the wavelet based on the highpass filte r 162 and give its expressions in both 

time and frequency domains. 
(iv) Verify that the wavelet spaces are 

W m = BL([-2- m+1 TT, -2- m 7r]U[2- m 7r,2- m+1 7r]). 

12.9. Meyer Scaling Function and Wavelet 

In Example 1 12. 7\ we derived one of the simplest Meyer wavelet, based on a continuous <J?(uj). 
We generalize this to smoother <l?(<x>)'s. For this purpose, introduce a helper function a(x) 
that is for x < and 1 for x > 1, and satisfies 

a(x) + a(l - x) = 1 for < x < 1. (P12.9-1) 

An example of such a function is 

(" x < 

a(x) = I 3x 2 - 2x s < x < 1 (P12.9-2) 

( 1 x>l. 

Construct the scaling function <&(a;) as 



*(«) = ya(2-il^) (P12.9-3) 

(i) Verify that a(x) in (]P12.9-2| satisfies ([P12.9-1]) and that it has a continuous first 

derivative, 
(ii) Verify that $(oj) given by (P12.9-3I satisfies 

Y^ I *(w + 27rfc) | 2 = 1 

fcez 

and thus, that {ip(i — n)}„ E z is an orthonormal set. Hint: start by using a(t) given 
in (P12.9-2| ). and then a general a(t) as in (JP12.9-1) . 
(iii) With Vb = span({</j(i — n)} ne z) and V m defined the usual way, prove that 

Vb C F-i 
(iv) Show that there exists a 2-7r-periodic function G(e^) such that 

*(w) = -^=G{e^/ 2 )^(uj/2) (P12.9-4) 

v2 

and that _ 

G(e j ") = V2j2®(2uj + 4:TTk) (P12.9-5) 

fcez 



162 This differs from TBD in that we skip the shift by L, since here we have a two-sided infinite 
filter impulse response. 
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(v) Verify (12.86b) by showing that 

</, tfim,,n) = 0, m,n £ Z 

implies necessarily that / = 0. 
(vi) Verify (112.86c | ) by showing that if 

/e n y ™ 

then necessarily / = 0. 
(vii) From (P12.9-5) and the usual construction of the wavelet, give an expression for 

\t(aj) in terms of 0(u>). 
(viii) For a(x) given in (P12.9-2) , what decay is expected for <p(t) and lp(t)7 

12.10. Admissibility of Daubechies Wavelets 

Show that all orthonormal and compactly supported wavelets from the Daubechies family 
satisfy the admissibility condition ([12.101] ). 

12.11. Finite-Dimensional Reproducing Kernels 

Consider an M-by-TV matrix T, M > TV, that maps vectors x from M." into vectors y living 
on a subspace S of R M . 

(i) For T having orthonormal columns, or T T T = I, and K = TT T (see (12.113) ). 
what can you say about the vector 

V = Ky, 

where y is an arbitrary vector from K M ? 
(ii) For T having TV linearly independent (but not necessary orthonormal) columns, give 

a simple test to check whether a vector y in R M belongs to S (see (12.114) )- 
(iii) In case (ii) above, indicate how to compute the orthogonal projection of an arbitrary 
vector y in M M onto S. 

12.12. Reproducing Kernel Formula for the Wavelet Transform 

Show the converse part of Proposition 1 12.15] That is, show that if a function F(a, b) sat- 
isfies (12.116) , then there exists a function /(£) with the wavelet transform equal to F(a,b). 

12.13. Initialization of Mallat's Algorithm 

Create an approximation problem for a smooth function (for example, bounded with 
bounded derivative) and compare rate of decay for Haar and piecewise linear approxi- 
mation. Details later. 

12.14. Biorthogonal Multiresolution Analysis 
Consider the hat function 

1-1*1 |*| <1 



^ -~ I else 

and the family {<p(t — n)} n€ %. 

(i) Characterize Vq = span({</j(t — i)} n ez) 
(ii) Evaluate the deterministic autocorrelation sequence 

On = (<p(t), <p(t - n)) 

verifying that <p(t) is not orthogonal to its integer translates, 
(iii) Define the usual scaled versions of Vq , V m ■ Verify that the axioms of multiresolution 
analysis given in Section 12.3.31 

12.15. Geometry of Biorthogonal Multiresolution Analysis 

Consider the biorthogonal family {tp(t),<p(t), il>(t),$(t)} as defined in (12.68b) , (12.69b) , 
TBD and TBD, as well as the associated multiresolution spaces {V m , V m , W m , W m }- 
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(i) Verify that 

v m = v m +i e w m +i 

and similarly for V m . 
(ii) Show that V m and W m are not orthogonal to each other, 
(iii) Verify the orthogonality relations 

W m -1 V m 

and 

Wm ± Vm 

(iv) Show further that 

W m ± Wm + k ky^O 

Hint: Show this first for k = 1,2,... using part 1. 
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13.1 Introduction 

Chapter Outline 

13.2 Abstract Models and Approximation 

13.2.1 Local Fourier and Wavelet Approximations of Piecewise 
Smooth Functions 

13.2.2 Wide-Sense Stationary Gaussian Processes 

13.2.3 Poisson Processes 

13.3 Empirical Models 

13.3.1 £ p Models 

13.3.2 Statistical Models 

13.4 Estimation and Denoising 

13.4.1 Connections to Approximation 

13.4.2 Wavelet Thresholding and Variants 

13.4.3 Frames 

13.5 Compression 

13.5.1 Audio Compression 

13.5.2 Image Compression 

13.6 Inverse Problems 

13.6.1 Deconvolution 

13.6.2 Compressed Sensing 
Chapter at a Glance 

TBD 

Historical Remarks 

TBD 

Further Reading 

TBD 
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Appendix 

13. A Elements of Source Coding 

13. A. 1 Entropy Coding 

13. A. 2 Quantization 

13. A. 3 Transform Coding 
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